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Preface 


A general statement about the background, purpose, assumptions, and 
scope of the Handbook of Mathematical Psychology can be found m the 
Preface to Volume I 

Of the seven chapters of this volume, the first five are concerned with 
mathematical applications to more-or-less traditional psychological 
problems, and so they do not require any special comment The last two, 
on stochastic processes and functional equations, exposit purely mathe- 
matical matters Although several books on stochastic processes exist, we 
are inclined to believe that a discussion which lies somewhere m difficulty 
between an elementary text and an advanced treatise, such as Doob’s,^ 
will be useful for many of those applying stochastic models to psycho- 
logical problems Even fewer expositions can be found in the area of 
functional equations The only book of which we are aware, Aczel’s.^ 
IS in German (fortunately, an English revision will soon be available) and 
IS, perhaps, not within easy mathematical reach of the majority of those 
working m mathematical psychology There seemed, therefore, reason to 
include a brief outline of some of the basic results about certain classes of 
functional equations that have played a role m contemporary mathematical 
psychology 

Our editorial efforts have been greatly eased by Mrs Sally Kraska, who 
handled the numerous details of coordinating the authors, publisher, and 
editors, and by Mrs Kay Estes, who has skillfully prepared the index for 
this as well as the two preceding volumes We thank them both 

Finally, we are indebted to the Universities of Pennsylvania and 
Washington, to the National Science Foundation, and to the Office of 
Naval Research for the support — intellectual and personal, as well as 
financial — that they have provided us throughout this project 


Philadelphia Pennsyliania 
September 1964 


R Duncan Luce 
Robert R Bush 
Eugene Galanter 


*Doob, J L Stochastic processes New York Wiley, 1953 

* Aczil, J Vorlesungen uber Functionalgjeickur^en und (hre Annendur^en Basel and 
Stuttgart BirkhaOser, 1961 
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1. Tkts work was supported tn pari by Contract No. B-3950 between the National 
Institute of Neurological Diseases and Blindness, Public Health Seroice, and 
Syracuse Umoersity. 

2. There are several persons I would like to thank for their kelp in preparing the 
chapter. They have no responsibility for its shortcomings. My secretary, Mrs. 
Carole Selkirk, typed the first draft, combining speed with care, infinite patience, 
and good disposition, Mr, Erdogan Oikaraii^ copied all the maibemahcal 
formulas, and Air Allen Ayres and Air, Charles Reiner did a superb job on the 
ink drawing and photocopying of the many figures. My lery special gratitude goes 
to my wife who edited parts of the chapter, proofread the whole, and gave me the 
moral support necessary for ivnltng it despite the extreme dearth of time. 

I would also like to thank Dr. David Af. Green for his careful review of the 
chapter and valuable suggestions. 

3. In a not very successful effort to meet a deadline, the chapter has cerlatn obvious 
omissions in its content as well as form. The most important deletion that bad to be 
made concerns interaural effects. Readers af the Handbook art referred to an excel- 
lent article on the subject by J. L. Hall andC. Af. Pyle in the Readings, fW //. 
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Analysis of Some Auditory Characteristics 


In the second half of the nineteenth, and even m the first quarter of the 
twentieth century, this chapter might have been called “Theory of Hearing ” 
Past the halfway mark of the twentieth century, the accumulated empirical 
knowledge inspires more prudence We know too much to delude our- 
selves that we understand the phenomenon of hearing as a whole A 
physiological or pseudo-physiological explanation of one or another 
auditory attribute should not be misconstrued as a theory of the total 
process Hypotheses about such a process are possible, but they should 
await a cordirmation by a sufficient body of cJupmcaJ data before being 
called theories 

This chapter is traditional in that it attempts to account for some of the 
most prominent auditoiy charactenslics in terms of physiological mecha- 
nisms The first characteristic so analyzed is pitch as a function of sound 
frequency — theclassicalsubjectofthe“theonesofhear7ng ” The following 
section 15 devoted to the efiects of sound frequency and temporal stimulus 
patterns on the threshold of audibility The threshold is defined in this 
context as the stimulus intensity that produces a 50% chance of signal 
detection under constant experimental conditions Further sections deal 
with masking, contrast, and critical band phenomena, and with loudness 
All these sections are preceded by an elementary theory of the sound 
stimulus and by a theory of sound transmission in the ear 

Readers with a background in physical acoustics should be able to 
omit the first section The second section which is purely physical and 
physiological may appear too long for a handbook of mathematical 
psychology It is included, nevertheless, because it leads to the speci- 
fication of the auditory stimulus at the level of sensory cells and con- 
stitutes a foundation for further psychophysiological analysis Several 
auditory characteristics depend to a large extent on the stimulus trans- 
formation between the entrance to the auditory canal and the sensory 
cells in the cochlea, as may be seen from the following examples The 
interrelationship between pitch and sound frequency appears to be 
determined pnmanly by the location of the vibration maximum m the 
cochlea Frequeni^ discrimination seems to be controlled in part by the 
shift of this location with frequency and by the sharpness of the vibration 
maximum The critical bands, which appear to result from a self-adapting 
filter mechanism, may be shown to be correlated with the distribution of 
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the vibration amplitude along the cochlear partition The mechanical 
transmission characteristic of the ear has a marked effect on the auditory 
sensitivity as a function of frequency 
There is another reason for emphasis on the mechanical transmission 
characteristic Although some may regard it as almost trivial, m fact, 
it is very complex and not completely known Its elimination from the 
total auditory process by the ability of specifying mathematically the 
stimulus at the sensory cells considerably simplifies the analysis of the 
neural coding of auditory information and leads to worthwhile 
psychophysiological conclusions 

In the sections dealing with psychological characteristics of hearing, an 
effort has been made to avoid purely hypothetical postulates and to remain 
as close as possible to empirical evidence. The derived theorems have been 
validated by means of numerical comparison with experimental data. 

In general, the mathematical theory has been used for interpolation 
rather than extrapolation At present, it appears more important to 
integrate the already observed wealth of phenomena into a coherent 
system rather than to predict new ones For the limited scope of this 
chapter, such a system seems to have been achieved — no mutally in- 
consistent postulates have been accepted, and no theorems have been 
found to be in conflict with each other Admittedly, absence of incon- 
sistency does not necessarily mean consistency, it could mearf mutual 
independence This situation does not arise, however, and a meaningful 
overlap does exist 

All postulates necessary for the calculation of the pitch function from 
the physical and anatomical constants of the ear enter into the calculation 
of the absolute threshold as a function of frequency With one exception, 
all added factors have been denved in a straightforward manner from 
purely acoustic measurements The exception is a theorem which follows 
from a theory of temporal auditory summation This theory has been 
derived from one psychoacousUc experiment dealing with the threshold 
'tfi ''didihiUtcy *im -parrs ifi -shioA -pdises Vi ’was ■awunfii^ry -prirfin^reh “fne 
threshold for all temporal stimulus patterns used so far, it also applies 
to other sense modalities and to motor activity In the last section of the 
chapter, it is shown that the same theory holds for temporal loudness 
summation The loudness function is combined with the theory of 
temporal summation with the help of three postulates (I) that the 
absolute threshold of audibility is controlled by the mtnnsic noise of the 
auditory system, (2) that the rate of neural firing decays with stimulus 
duration, and (3) that the loudness function is independent of stimulus 
duration except for a multiplicative constant The first two postulates [are 
widely accepted, and there is good physiological evidence for them It is 
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shown that the first agrees numencally with the masking function near the 
absolute threshold, and that both are numerically consistent with 
physiological measurements on the penpheral auditory system of monkeys 
and cats The empincal evidence for the third has begun to accumulate 
The steady-state loudness function itself is denved from S S Steven’s 
power law, the postulate of intrinsic noise, and the phenomenon of 
additivity of stimulus power within a critical band A further postulate 
of central loudness additivity has been at least approximately verified by 
a substantial number of experiments The loudness function appears to 
agree with the measured firing rate m the auditory nerve 

Somewhat more independent of other psychoacoustic phenomena is the 
demonstration that the critical band may be accounted for by v Bekesy’s 
neural unit developed for the eye and the skin Nevertheless, even this 
demonstration relies heavily on the calculated distribution of the vibration 
amplitude in the cochlea 

Because of their overlap and mutual consistency, the various denvations 
of the chapter may be considered as parts of one theory 


1 MECHANICAL VIBRATION AND SOUND 
PROPAGATION 

In psychoacoustics, human or animal responses are correlated with 
physical stimulus parameters As a consequence, it is not sufficient to 
describe the behavior of experimental subjects — a thorough knowledge 
of the stimulus is also necessary In quantitative experiments, the stimulus 
must be described in mathematical terms 
The auditory stimulus consists of mechanical vibrations propagated in 
the form of waves The vibration may be produced mechanically by 
stnking an efastic object capabfe of vibraUon, or e&cfrorTrechiimca^iy by 
activating such an object by means of an electrostatic or an electromagnetic 
field Strings, bars, membranes, and thin plates are used for the purpose 
Modem psycboacoustics almost exclusively uses electromechanical 
instruments, called electroacoustic transducers Depending on whether 
they are coupled to the car by means of a small volume of air or radiate 
sound into larger enclosures, like rooms or auditoriums, they are called 
earphones or loudspeakers One advantage of electroacoustic transducers 
is that their acoustic output can be easily controlled within a considerable 
range of sound frequencies and intensities 

Sound waves can be measured by microphones which are electro- 
acoustic transducers converting mechanical energy into electrical energy 
They maj be regarded as earphones or loudspeakers working in reverse 
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The function of the peripheral portion of the auditory system is similar 
to that of microphones and some of the mechanically active parts of the 
ear resemble known microphone structures Because of this similarity, 
the mathematical theory that applies to electroacoustic transducers is 
also useful in analyzing the sound transmitting system of the ear 


1 1 Mathematical Nomenclature 

We now shall develop some of the theoretical concepts of acoustics, 
especially those of forced vibration, impedance, electroacoustic and 
electromechanical analogies, and of wave propagation The development 
is limited to the simple harmonic functions, i e , sine and cosine functions 
For many purposes, it is more convenient, however, to express such 
functions m complex nomenclature and to regard the sine and cosine 
functions as projections of a special vector, called a phasor, on two 
orthogonal axes The convention is illustrated in Fig 1 The real 
numbers are plotted along the horizontal axis, the imaginary numbers 
along the vertical one The phasor of magnitude r is at a phase angle 6 
from the real axis Then the horizontal projection has the magnitude 

X = |r| cos 6 

and the vertical proiection . . 

^ y = \r \ sin 0, 

with iCmax = 1/max = |r| called the amplitude 

The phasor r may be expressed as a vector sum of the orthogonal 

components . = H (cos fl +y s.n 6) (3) 


( 1 ) 

( 2 ) 



Fig 1 Pha&or in a complex plane 
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It can be shown that such a sum is equal to |r| and this symbol is used 
to describe the phasor. One has to remember, however, that the actual 
physical situation, that is, the position of a vibrating body, is described 
by the projection of the phasor on one of the orthogonal axes rather than 
by the phasor itself We take either the real or the imaginary part of the 
phasor and write 


or 


a: = Re(|r| e’®) 
y = Im (fr( f*") 


(4) 


By making the angle 6 proportional to time, which means a rotation of 
the phasor r with a constant velocity, we obtain 

r — [r| — \r\ (cos tot sin w/), (5) 

where cos and sin coi are simple harmonic functions of time and at is the 
angular velocity Denoting one full rotation by 2Tr radians and the number 
of rotations per second (frequency) by /, we have oi = 2vf The time 
needed for a complete rotation, T = I//, is called the period There is no 
need to draw a sinusoidal curve here, instead, we can take another look at 
Fig 1 Let us consider 0 to be the phase angle at the time t » 0, then 

r = |rl s= jrl [cos (ott + 0) + j sin (wt + 6)] (6) 

When a particle of matter vibrates around its equilibrium position so that 
at the time r = 0 its excursion is zero, which is usually identified with 
0 s= 0, its position at any time is defined by 

y = jf J sin (lit 


If the initial position of the particle is at y = |r}, and therefore 6 = 7r/2, 
Its motion follows the function 


y = }r| sin 


("*;)- 


\r\ cos tot 


Restriction of the developments of this section to simple harmonic 
functions does not mean loss of generality, provided that we deal with 
linear systems, that is, provided that the intervening forces remain pro- 
portional to kinematic parameters This condition is usually fulfilled m 
acoustics, which deals with small oscillations According to the Fourier 
theorem,’ any periodic function can be represented by a sum of simple 
harmonic components, 


( 7 ) 
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and any aperiodic function can be represented within a finite interval of 
time by a continuous frequency spectrum, such that 

/(O = J sM'f" ts) 

As a consequence, any linear system may be fully described in terms of 
simple harmonic functions More detailed information on Fourier 
analysis as well as frequency spectra of various functions can be found, 
for instance, m The Fourier Integra! and its Applications (Papoulis, 1962) 


I 2 Forced Oscillation 

Let us now consider a simple mechanical system consisting of a rigid 
plate suspended elastically in a heavy frame (Fig 2a), and a mechanical, 
electrostatic, or electromagnetic force /'acting on the plate perpendicularly 
to Its surface If the force ts periodic, the plate will move up and down 
like a piston Such a system m a somewhat more complex form is found 
in many electroacoustic transducers whose diaphragm vibrates in almost 
the same way as a rigid piston at sufficiently low frequencies The system 
may be represented symbolically by a mass, a spring, and a frictional 
element, as shown m Fig 2b When the mass is pulled away from its 
equilibrium position to a position x, the spring is stretched and produces 
an opposing force sx that tends to bring the mass back to its equilibrium 



ib) 

Fig 2 A simple mechanical system A ngidpUte suspended elastically in a heavy frame 
(a) The mechanical structure, (6) its symbolic representation 
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position If the motion occurs with a velocity u = dxjdt, friction produces 
an opposing force ~r(dxfdt). Finally, the acceleration of the mass leads to 
an inertial force —m(d'^xjdt^) In general, all forces act at the same time, 
and a dynamic equilibrium is established when 


d^x , dx , 
m— - + r— + s2: = F 
dr dt 


(9) 


The expression is an inhomogeneous linear differential equation whose 
solution consists of the general solution of the homogeneous part and of a 
particular solution of the whole equation The solution of the homo- 
geneous equation, 

d^x dx 

w — -h r — -f- sa: = 0, (10) 

dt^ dt 

IS 


X ^ (II) 

where 



By selecting the constants Ai and A^ so that the right-hand expression of 
Eq II IS real, the equation can be transformed into a tngonometric 
function 

X =a v4c"*‘ cos (<Ojf 4- (12) 

It should be noted that Eq 12 descnbes an oscillation with an amplitude 
decreasing according to the function e”®* The amplitude decay becomes 
more rapid with increasing a The rapidity of amplitude decay is often 
expressed as a natural logarithm of the ratio of two consecutive ex- 
cursions on the same side of equilibrium Since such excursions are spaced 
m time by one period, we obtain 


The expression is called a logarithmic decrement 

For the particular solution of the inhomogeneous equation, we assume 
that the external force is a simple harmonic function of time, Pe^'°* 
The particular solution then becomes 


ja»r -}• (5 — <o*m) 


( 14 ) 


and the total solution can be written in the form 


jfor 4 - (s — tu*m) 


( 15 ) 
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The first term decays after a while and constitutes the so called transient, 
well known to anybody who has worked in psychoacoustics Transient 
responses of earphones or loudspeakers arc often heard as clicks and may 
interfere with the measurement of auditory responses Because of them, 
most auditory stimuli have to be turned on gradually, so that it is 
difficult to measure short duration effects The second term of Eq 15 
describes a sustained oscillation whose amplitude and phase with respect 
to the amplitude and phase of the driving force depend on frequency and 
on the parameters /«, and r 


1 3 Concept of Impedance 


The steady state part of Eq 15 can be expressed in terms of velocity of 
motion rather than of displacement, and we obtain 


or 


dt r + — sf(o) ’ 

*" = '■ + 1 

U \ (J)} 


(16) 

(17) 


The ratio Fju is usually denoted by and is called mechanical impedance 
The concept of mechanical impedance does not apply exclusively to the 
structure of Fig 2 but it may be generalized to any mechanical structure 
driven at a certain point As soon as the impedance is known it is possible 
to find the velocity produced by a given force without having recourse to 
a differential equation 


II 


_F 


(18) 


Since displacement is equal to the time integral of velocity, we can simply 
write 

* =J“d< = jfe'-’dt 

(19) 

F 

JmZ.. 

Similaily, the acceleration follows as 


dt~ Z„ 


( 20 ) 



MECHANICAL VIBRATION AND SOUND PROPAGATION 


II 

Note that the impedance term does not enter into the integration and 
differentiation operations 

These few examples should suffice as a demonstration of the operational 
simplification introduced by the concept of impedance A further ad- 
vantage IS that the impedance of any linear structure consisting of Jumped 
elements can be calculated by means of algebraic operations For instance, 
the impedance of the structure m Fig 2 consists of three parts the im- 
pedance of the mass m, the impedance of the spring s, and the impedance 
of the frictional element r The expressions for these partial impedances 
are Zj = jcom, Zj = s/jo), and = r The total impedance is equal to 
the sum of the components 

= Zi + Z2 + 23 = j^G)m — — j -f r (21) 

The mechanical impedance generally consists of a real part 2i, called 
resistance, and of an imaginary part X, called reactance A purely 
frictional element has an impedance consisting of a resistance, a mass 
element, of a positive reactance, a stiffness element (spring), of a negative 
reactance 

(See also Beranek, 1 949 ) 


1 4 Theory of Electromechanical and Electroacoustic 
Analogies 

The concept of impedance was borrowed from electrical network 
analysis where it has proved to be extremely useful This bnngs us to the 
theory of electromechanical analogies Let us consider an electrical 
scrres crrucrrf ojitsisteerg <if sa ladticiitiice Z, a resistance Z*, and a capaci 
tance C The circuit can be described by the differential equation 

h^ + R^+^=t. ( 22 ) 

d/® dj C 

where g means electnc charge and e (he applied voltage The steady- 
state solution of this equation is 

= ( 23 ) 

^ jtoR + (1/C - w*L) ’ 

and Since electnc current is a time derivative of electric charge. 




R -h jituL — i/a)C) 


( 24 ) 
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The first term decays after a while and constitutes the so-called transient, 
well known to anybody who has worked m psychoacoustics Transient 
responses of earphones or loudspeakers arc often heard as clicks and may 
interfere with the measurement of auditory responses Because of them, 
most auditory stimuli have to be turned on gradually, so that it is 
difficult to measure short-duration cfficcts The second term of Eq 15 
describes a sustained oscillation whose amplitude and phase with respect 
to the amplitude and phase of the driving force depend on frequency and 
on the parameters w, j, and r 


1 3 Concept of Impedance 


The steady-state part of Eq 15 can be expressed in terras of velocity of 
motion rather than of displacement, and we obtain 


or 


^ ^ 

dt r + j({om — s/io) ’ 

- — r -b — — ) 

U \ 0)/ 


( 16 ) 

(17) 


The ratio F{u is usually denoted by and is called mechanical impedance 
The concept of mechanical impedance does not apply exclusively to the 
structure of Fig 2 but it may be generaliied to any mechanical structure 
driven at a certain point As soon as the impedance is known, it is possible 
to find the velocity produced by a given force without having recourse to 
a differential equation, 


M = 




(18) 


Since displacement is equal to the time integral of velocity, we can simply 
write 


= i dt. 

or ” 

Similarly, the acceleration follows as 


(19) 


du _ jcoF 


( 20 ) 
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Note that the impedance term does not enter into the integration and 
diiferentiation operations 

These few examples should suffice as a demonstration of the operational 
simplification introduced by the concept of impedance A further ad- 
vantage IS that the impedance of any linear structure consisting of Jumped 
elements can be calculated by means of algebraic operations For instance, 
the impedance of the structure in Fig 2 consists of three parts the im- 
pedance of the mass m, the impedance of the spring s, and the impedance 
of the frictional element r The expressions for these partial impedances 
are Zj = joim, = j//w, and Z^ — r The total impedance is equal to 
the sum of the components 

Z„ = Zi + Zj + Zj =j(mm -"] + '■ (21) 

The mechanical impedance generally consists of a real part R, called 
resistance, and of an imaginary part X, called reactance A purely 
frictional element has an impedance consisting of a resistance, a mass 
element, of a positive reactance, a stiffness element (spring), of a negative 
reactance 

(See also Beranek, 1949 ) 


1 4 Theory of Electromechanical and Electroacoustic 
Analogies 


The concept of impedance was borrowed from electrical network 
analysis where it has proved to be extremely useful This brings us to the 
theory of electromechanical analogies Let us consider an electrical 
series circuit consisting of an inductance L, a resistance R, and a capaci- 
tance C The circuit can be described by the differential equation 


r^ + R^S+3.=e 
dt’ dt c ' 

where q means electric charge and € the applied voltage 
state solution of this equation is 

q- — 




( 22 ) 

stcady- 

(23) 


and since electric current is a time denvative of electric 


charge, 


( 24 ; 


K + j{«uL — 1/toC; 
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(a) 


m 



(i>) 


Fig 3 An electric senes circuit ( 4 ), and its mechanical analog {b) 

The electric impedance of the circuit follows as 

Z, = - = R +j{ioL - 4;) 

I \ wC/ 

It IS easy to see that the equations describing the electrical senes circuit are 
analogous to those for the mechanical structure of Fig 2, and that the 
latter can be obtained from the former by simply replacing the voltage E 
by the force F, the electric charge q by the displacement x, etc We obtain 
the following analog pairs 



s 


The electric circuit of Fig 3a, is an analog of the mechanical system of 
Fig 3b We can find analog circuits for other systems For instance, the 
parallel circuit of Fig 4a is an analog of the mechanical system of Fig 4b 
The concept of impedance can also be applied to acoustic systems 
For instance, a constriction m a tube acts like a mass combined with a 
resistance, a dilation like a stiffness or its reciprocal, the compliance 
Thus, the acoustic system of Fig 5 is an analog of the mechanical system 
of Fig 2 

In acoustics, the mechanical impedance is often replaced by the specific 
impedance or the acoustic impedance The specific impedance is defined 
as the ratio of sound pressure to the velocity of motion If in Fig 2a we 
denote the surface area of the moving plate by A and assume that the 
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<b) 


Fig 4 An electric parallel circuit (a), and its mechanical analog (6) 


force F is evenly distributed over that area, the sound pressure follows from 


F 

The specific impedance is then 

(27) 

z 

£,ap — — — 

(28) 


The acoustic impedance is defined as the ratio of sound pressure to 
volume velocity. Again, in Fig. the volume velocity of the plate 
amounts Xov=» uA, so that the acoustic impedance is 



Summarizing, 

Zbp = ^, and 2, = ^. (30) 

A A' 

The concept of acoustic impedance can be veiy helpful in the control 
of the auditory stimulus and a(so m the analysis of the acoustic function 
of the ear When an earphone is placed over the pinna, the sound prcssuic 
It generates at the entrance to the ear canal is equal to the volume \ clocity 



F»e 5 An ftciKtttic rrwMior It i* sn analog of thr rr«ltaniea! of 2 ami 

of thr rlrctncal oront of Fijj 3 
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of the diaphragm multiplied by the acoustic impedance of the car as 
measured at the level of the diaphragm. 


The volume displacement depends, in turn, on the electromagnetic force 
acting on the diaphragm and on the sum of the acoustic impedances of the 
diaphragm and of the ear, 



(32) 


Consequently, the sound pressure at the ear amounts to 


Pi 



(33) 


If the earphone is calibrated in a standard way on a 6 cc coupler, the 
sound pressure generated in the coupler is 



and the ratio of sound pressures for a constant i e , a constant voltage 
across the earphone terminals, is 

£* _ + Znd) 

P. Z«(Z„ + Z,,) 

This relation makes it possible to determine fairly accurately the sound 
pressure m the ear canal once the earphone has been calibrated on the 
standard coupler Data for the average impedance of the ear are available 
and most of the time it is possible to assume that Z^i » ^ Z^e, so that 


P*“Z„ 


(36) 


In the analysis of sound transmission in the ear, the impedance at the 
eardrum determines how much sound is absorbed by the ear and how much 
IS reflected back It helps in calculating the frequency responses of the 
middle and inner ear, permitting us to specify the stimulus at the level of 
the sensory cells By these means many psychoacoustic characteristics 
may be explained See Olson (1958) 


1 5 Mechanical Wave Propagation and Reflection 

The vibration of a mechanical instrument is ordinarily transmitted to 
the ear by waves propagated through the ambient air In the event that a 
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Vibrator is applied directly to the head, the transmissjon can occur through 
the skull bones The second mode is used in some clinical tests but is of 
limited value m psychoacoustics so that we shall focus our attention on 
the first 


Vibration of a plate or a similar object produces compressions and 
dilations of air directly m front of it These changes of pressure, which are 
accompanied by motion of air particles, spread through the medium as a 
result of elastic restoring forces acting among adjacent particles The 
particles themselves oscillate around their equilibrium positions and do 
not change their average positions In contradistinction to electro- 
magnetic waves for example light waves, the mechamcal waves cannot 
be propagated m vacuum — they require a material medium They can be 
generated m fluids as well as in solids provided that a deformation or 
compression of the medium evokes a restoring force Except for gravity 
waves, such a restoring force is elastic in nature and can usually be de- 
scribed by Hooke’s law for solids and by the bulk modulus of elasticity for 
fluids Hooke’s law simply states that the restoring force is proportional 


to deformation Thus 


F = -YS — , 
dx 


where F is the restoring force, Y the proportionality constant called 
Young’s modulus, S the cross sectional area perpendicular to the force, 
and d? the elongation of the elemental length dx 
The bulk modulus of elasticity is defined by 


5= - 


dP 

dVjV' 


( 38 ) 


where dP is the change of pressure and dV is the associated change of 
volume V The bulk modulus can be derived from the general gas law and 


IS equal to 


B==yP, 


( 39 ) 


where y is the ratio of specific heats and P is the static pressure 
The change of volume or compression of fluid is descnbed by the 
continuity equation which stales that the divergence of the flow of fluid 
through a space element is equal to the change of density within the space 
element , a 

/>« Si 

IS one form of the continucty equation, where 


div u 


8ug , flu, 
9r fly 


+ 


flu. 
Bz ' 


( 40 ) 
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It IS the velocity vector with the orthogonal components w,, nnd u^, 
Pq denotes the resting density, and dpfdt the time dcnvative of density. 

Propagation of mechanical or acoustic waves requires a continuous 
exchange between potential and kinetic energy. In order to desenbe the 
process m simple terms, let us consider longitudinal waves m a bar of 
moderate cross-sectional area S The restoring force produces an ac- 
celeration of particles such that 


' da: = —apoS ax, 
dx 


(41) 


with dfjdx the change of restoring force with x, a = is the ac- 

celeration, po the density, and dx the element of length. Substituting 
Eq 37 for we obtain 


or 




(42) 


§!i = LSI 

9a:* c* 9t* ’ 


the differential equation of motion for longitudinal plane waves 
quantity 


c = ^/W<, 


The 

(43) 


denotes the velocity of ^ave propagation, which is also called phase 
velocity 

It can be shown that any function of the argument cr ± * is a solution 
of the differential Eq 42 Thus, 

^=^Mct — x)+f^(ct + x) (44) 

IS the most general solution, where f^ and ft are arbitrary well-behaved 
functions It indicates that events that take place at the location occur 
at Xt with a time delay Af = (xj — 3:,)/c, and are propagated, therefore, 
with the velocity c = (xj — Since the functions /i and /a do not 

change with * or f, a disturbance produced at Xi is transmitted through 
the medium without distortion — a phenomenon of fundamental im- 
portance in communication This is strictly true only within the validity 
of Hooke’s law and when there are no dissipative losses Hooke’s law 
almost always holds for small disturbances Dissipation of energy is 
always present to a greater or lesser extent, it limits the useful range of 
information transmission 

Equation 44 is called a wape equation The function /i indicates wave 
propagation in the positive x direction,y “2 m the negative x direction One 
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of the most useful forms of the wave equation is the simple harmonic 
function which can be written either m terms of complex numbers, that is, 

X = (45) 

or m terms of sine or cosine functions, 


X ^ sm k{ct ~ x) +• S sin k{ct 4- x) (46) 


In these equations, k = lirffc with / meaning frequency of oscillation 
The constant k is called the wave constant because the wavelength X *= c(f 
and, consequently, k = 2n(X The wavelength is the distance covered by 
the disturbance in one period T =\[f For simple harmonic functions, 
Eq 42 may be simplified to 

dx^ ’ 

since (47) 


Wave equations for fluids are similar to those denved for longitudinal 
plane waves in solids Their exact form depends on the geometry of the 
system, however In order to denve a general three-dimensional differential 
equation, we introduce the velocity potential ^ such that 


— 


d4> 

dx ’ 



and 




d<f> 

dz 


Equation 40 can now be wntten m the form 


vv = -^ 

fip dt 


(48) 


Let us denote by p the excess pressure over the static pressure P and 
consider the forces acting on an element dx dy dz The restoring force in 
the X direction is equal to --(9p/9x) dx dy dz, and the inertia force for 
small disturbances to p(dujdt) dx dy dz Applying Newton’s second lav,, 
we obtain * - 

dx dt 


dp , , du, 

— dy + p — dy 
By ot 


0 . 


Similarly, 


(49) 
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Addition of the three expressions yields 

^ dz + ~ dy + ~ dz+ p^iu^dx + dy + h, dz) = 0, 

dx By oz ct 


which IS equivalent to 


dp — P’rid^) = 0 
ot 


H , 

etc 


Integration of Eq 50 leads to 


d<f> .. 


(50) 


(51) 


where jRT is a constant of integration When no disturbance is present in 
the fluid, both and p vanish, so that = 0 
In a further step, we use the definition of density 




M 


(52) 


where M is the mass of the volume V Differentiating, we obtain the 
expression 


dp — M 

which can be rewritten in the form 


-dV 

yz ’ 


^ ^ (53) 

P. V 

Replacement of —dVjV hy dplp^ m Eq 38 produces 


B = ^ 
dp/Po 


(54) 


Introducing P = pg + p and expressing both P and p as a function of 
time, we obtain 


dp B dp 
dt po 


(55) 


By combining Eqs 51 and 55 it is possible to eliminate p. 


i.££ Po 

po dt B dt^ ’ 


( 56 ) 
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and substituting for 0/PoX^p/^^) in Eq 48, we finally obtain 


*a/*’ 


(57) 


with c = V Bjp determining the velocity of propagation 
Acoustics often deals with small sound sources and with spherical 
waves emitted by them Under such conditions Eq 57 is expressed in 
polar coordinates Since it is well beyond the scope of this chapter 
to discuss these matters, however, we shall limit ourselves to the con- 
sideration of plane waves They are approximated by spherical waves at a 
sufficiently large distance from the source Waves of this kind are also 
propagated in tubes whose diameter is small compared to the wavelength 
To a sufficient approximation, the ear canal may be regarded as such a 
tube In plane waves, d(f>fdy and d<f>Jdz vanish, and Eq 57 becomes 


dx^ c* di* ’ 


(58) 


which IS analogous to Eq 42 As a consequence, 

^ = + = (59) 

IS a solution of Eq 58 for simple harmonic time functions The variable 
pressure p, called sound pressure for audible frequencies of vibration, 
follows from Eq 51 

at 


and the particle velocity from the definrtion of the potential 

- A-) 

ax 

In Eqs 59, 60, and 61, 4>+ denotes wave propagation m the positive x 
direction, m the opposite direction The second wave is usually due to 
reflection of a primary wave at some boundary, so that m the absence of 
reflection vanishes In this situation. 


Z,„ = ^=pc ( 62 ) 

u 

defines the specific impedance of the medium for progressive plane waves 
In air, under normal atmospheric conditions, its numerical value amounts 
to approximately 41 5g/cm* sec It should be noted that the impedance is 
real, so that sound pressure and particle velocity are in phase 



so 


ANALYSIS OF SOME AUDITORY CHARACTERISTICS 


If waves are propagated in a tube of cross-sectional area S, the acoustic 
impedance for progressive waves amounts to 



For instance, an average ear canal has a cross-sectional area of about 0 38 
cm*, so that its acoustic impedance would amount to approximately 110 
g/cm* sec if there were no reflection at the eardrum 

When the tube in which acoustic waves are propagated is not of infinite 
length, but is terminated by an acoustic impedance different from its own 
impedance, a partial wave reflection lakes place At the place of reflection 
the sound-pressure ratio between the reflected and the incident wave 
amounts to 


Pi Z 2 -p 2i 


(64) 


where Zi is the impedance of the tube and Zj the terminating impedance 
When the real and imaginary parts of the impedances are separated, the 
equation becomes 


£f _ Rt-- Ri+ jjXj - arQ 
Pi R2 + + iCj) ’ 


(65) 


^ ( 66 ) 
. P* 

with 

/3 = 

and 




- R^f + - :c,)‘ 

(R, + R,f + (,x^ + x,f' 


(67) 


e = arc tg -22 _ arc ctg S^+J^ (68) 

i?2 — ^2 + Ri 

At a distance I from the place of reflection the ratio of pressures becomes 


^ (69) 

so that the total pressure amounts to 
P = P„a + 

with its amplitude varying as a function of frequency between 
P«(l + and Pud — 


( 70 ) 
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The minimum occurs when G — 2A:/ = —jt, or / = ;i[(G + ;r)/4n-] For a 
very large terminating impedance, G -»• 0 and / = If A For a very small 
terminating impedance, G tt and / = Xjl In the first instance, the 
sound pressure at the end of the tube {x = 0) may be considerably higher 
than at a; =s / This phenomenon affects quite considerably the frequency 
response of the ear See Kinsler & Frey (1962) 


2 SOUND TRANSMISSION IN THE EAR 


2 1 Specification of the Auditory Stimulus 


In auditory experiments the sound stimulus is usually specified m terms 
of sound pressure in dynes/cm®, or in terms of sound pressure level (SPL) 
m decibels above 0 0002 dyne/cm* As long as we deal with plane sound 
waves or with sound energy generated m a small cavity, like that enclosed 
under an earphone cushion, such a specification is sufficient In plane 
waves, the particle velocity is m phase with the sound pressure and its 
magnitude is determined by the specific impedance 


u 


P, 

pc 


(71) 


In a small enclosure the ratio between sound pressure and particle velocity 
is equal to the acoustic input impedance As a consequence, all sound 
parameters, including sound intensity, are uniquely determined by the 
sound pressure The sound intensity of a plane wave follows from the 
equation 



(72) 


and It IS expressed in ergs/cm* sec in the cgs system One watt sec is 
equal to 10’ ergs The sound pressure and particle velocity are expressed 
m peak values If, however, a small sound source is placed close to the 
listener s head, the waves are sphcncal rather than plane, and the relations 
62 and 71 do not hold A measurement of sound pressure does not specify 
the acoustic event completely Since in laboratory expenments such 
conditions are usually avoided, we will not discuss them any further 
Even if we restrict ourselves to consideration of plane waves and small 
enclosures, measurement of sound pressure does not completely specify 
the stimulus It is necessary to specify the conditions of measurement 
For instance, it is possible to determine the sound pressure at the entrance 
to the ear canal or, with a probe microphone, m the vicinity of the eardrum 
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Two procedures have become standard. In one of them, sound is generated 
by a loudspeaker placed in an anechoic chamber, that is, a room with 
acoustically treated walls. The listener is placed 6 ft from the loudspeaker, 
facing It. The sound pressure is measured at the location of the listener’s 
head before mtrodueing the listener into the sound field. The measured 
sound pressure is different from that at the entrance to the ear canal 
because of sound diffraction at the listener’s head. In the second method, 
the test earphone is calibrated on a standard 6 cc cavity serving as a 
coupling device between the earphone and the measuring microphone. 
Since the acoustic impedance of the standard cavity differs from that of 
the ear, the sound pressure produced at the microphone is not the same as 
that produced m the ear. The difference depends on sound frequency and 
the mechanical impedance of the earphone. Both methods lead to an 
abntrary specification of the stimulus. This is of no serious disadvantage 
m a purely empirical approach to auditory problems. However, it does 
constitute a handicap in an analytical treatment of auditoiy charactenstics 
From the latter point of view, it is helpful to define the stimulus at the 
eardrum, or even further, at the level of sensory cells In so doing, we 
knowingly eliminate a part of the sensory process No generality is lost, 
however, if we are able to specify the stimulus transformaUon that takes 
place m the sound transmitting part of the ear. The transformed stimulus 
may be reintroduced into the stimulus-response relationship as a new 
variable We obtain ^ 

V = ^t/(i)l, (73) 


I 7“"* the response function, j is the acoustic stimulus, /(s) is 
terms* f ct Sensory cells, and (ft the response function in 

, e of the operation is a substantial 

“ ‘r' '■“"'•'on /(s) IS quite complex, 

aho ar!**’"'''’’*’*!'*' '™ 'fW The operation 

varlh? A ™ **1”°" “"'P'0‘0 separation of mechanical and neural 
mTssmn m m "oparation is not possible because sound trans- 

ttat arVan mt™ ? r “""°"od to a certain extent by two muscles 
rof s ro„H ® “ "ootomechanical feedback lo4 The effect 

onw at Wb 7 ™P"‘"‘"00. however, since the feedback becomes effective 
only at high sound intensities 

somLtuv t"n ^ *“,,‘''"''0 ‘ho transfer function /(s) and. more 

afa funaon a at the level of ensory cells 

focati^ o r?rf '' "■'"“"‘Od m a free sound field at the 

Fig6mavbehel f^l^ ° o listener s head The anatomical drawing of 
thf oterL “P'‘"”'"S ‘ho situation The sound waves e4er 

whereTan o? he,r * *^ P'opagafed up to the eardrum. 

Where part of the, r energy is reflected and part is transforiid into vibration 




Fig 6 A semidiagrammatic drawing of 
middle ear, which also shows the inner ear a 
permission from Davis (1959 p 56 ) 


. _ This vibration is communicated through the 
of the middle ear system Th particular, to the 

stapes and the oval window o station of the sound-transmitting 

cochlea The cochlear spiral is ^ s considerable effect on 

channel of the ear Its ““ 

the action of the “s^," |hc’ analyse has to begin with the cochlea 

the eardrum ror ini!> i » 
and proceed backwards 


2 2 Sound Transmission in the Cochlea 


Z Z aOUllU -a*—- 

ml canal of two and one-half turns which js fihed 
The cochlea is a bony walls The canal is divided 

ith liquid and surroun j ,be scala tympani The scala 

ngthwise into ‘he scala vesuta^^^^ „sy be 

istibuh IS eoi'"ee‘ed ‘ 5,,a,a tympani bepns at the 

msidered to begin at „chlea in a small owning 

,und window and e^s P^t,„„,, (fig 7) The oval window is 

roviding a *? L ,hc footplate of the stapes, the 

lied almost eomple>'>y ^ footplate is attached to the 

1= ossicular Cham ^ „gament The round window 

enmeter of the ova 
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Fig 7 A schematic drawmg of sound transmitting structures of the middle and inner 
ear Adapted with permission from v Bikisy {1960a p II) 


opens into the tympanic cavity and is separated from it by a thin membrane 
The cochlear partition itself consists in part of a bony plate, the lamina 
spirahs, and in part, of an almost triangular canal, the scala media (Fig 8) 
The canal is filled with endolymph and surrounded by two flexible walls, 
the Reissner s membrane and the basilar membrane, and by the rigid 
^ter wall of the cochlea The basilar membrane supports the organ of 
Corti containing the sensory hair cells According to v Bek6sy, its 
mechanical properties control the sound transmission m the cochlea to a 
considerable extent The Reissner’s membrane is very thin and its 
mechanical effect appears to be negligible 

^ treatment of sound transmission m the cochlea, we 

u V ^ place the z axis along the cochlear canals in the plane 

of the basilar membrane (Fig 9) With the help of v Bekesy’s obser- 
vations, we can assume that the waves that are propagated along the 
cochlear canals are long compared to the linear cross-sectional dimensions 
of the canals (v Bekdsy, 1947, 1960) Under these conditions the problem 

Ilo ^ "o need for the remaining 

orthogonal space coordinates (Zwislocki, 1946, 1948, 1950, 1953) 
When, because of sound pressure at the eardrum, the stapes is pressed 
nto the cochlea, it compresses the perilymph m the scala vestibuh and 
cooH The latter occurs through the deformation of the 

ochlear partition This compression is equalized almost completely by 
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Fig 8 Cross section of the cochlear partition of the guinea pig Reproduced with per- 
mission from Davis (1959, p 5$7) 


the outward motion of the round window membrane. FJetcher(J951, 2953) 
his shown that the remaining compression can be neglected and the 
cochlear fluids regarded as incompressible Let us now take a volume 
element of length dx and of a cross-sectional area of the scala vestibuli. 
Applying the continuity equation for incompressible fluids (dpldi = 0) 
(Lamb, 1945, Lindsay, I960), we sec that 


dx 


dX =: V dx. 


(74) 


where v means the volume velocity of the cochlear partition per unit 



Fig 9 A schematic drawing of an unrolled cochlea in a longitudinal section 
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length, and u„ is the average particle velocity m x direction, i e , 


Ut, *= 



For small ii and small dSjdx, and neglecting second-order small terms, 
Eq 74 can we rewritten as follows 


0(w„Se) du„ „ , dS„ _ 0i7„ 

" - ■” = — ip -f* ' l/p ip 

dx dx ox ox 


(75) 


Similarly, we have for the scala tympani 



(76) 


The sign in front of v in Eq 7o is reversed because the fluid in the scala 
tympani moves in the opposite direction from that in the scala vestibuli 
Assuming a sound pressure p„ m the scala vestibuU and pt in the scala 
tympani, the pressure difference amounts to 

P = P,-P> (77) 

This pressure difference must be equal to the acoustic impedance of the 
partition per unit length multiplied by the volume velocity v 

P = (78) 

The force acting on an element of unit length in the x direction amounts 
to 

+ ( 79 ) 

OX ot 


in the scala vestibuli, and to 





(80) 


m the scala tympani In both equations, p is the density of the perilymph 
and Rf is the resistance per unit length of the canal Differentiating Eqs 
79 and 80 with respect to x and neglecting small terms of the second order, 
we obtain 




ax 
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and substituting +ii for SJiduJdx) and -p for Sldujdx), 

<- Sv 

'3?=-'’ i; 
c ^ 3v^ „ 


( 82 ) 


Because of the relation given m Eq 77, it is possible to sum Eqs 82 


d^p __ d^p„ d^pt 
dx^ “ dx^ "" dx^ 



(83) 


Substitution for v from Eq 78 finally produces 


dx^ 




(84) 


the desired differential equation m terms ofp (Zwislocki, 1946, 1948) 
Equation 84 can be abbreviated by introducing S = 5^,8 JiSf + 5"^), and 
RfIS — {R/fSt + RftSc)IS^Str and simplified by restricting it to simple 
harmonic functions of time 


3'p _ R,+j<op 

an” sz, ^ 


(85) 


In order to determine the vibration pattern of the cochlear partition 
and have an insight into the stimulus conditions at the sensory cells, it is 
necessary to solve the differential equation, Eq 85 Because of the 
dependence of S, Rf, and on no general solution is available, and so, 

before proceeding any further, these parameters must be expressed as 
of a- 

Only meager data exist with respect to the cross sectional area of the 
cochlea Figure 10 shows this area, that is, the area of the scala vestibuli 
plus that of scala tympani, as derived from graphs of three cochleas 
published by E G Wever (1949) and of one cochlea measured by the 
author (Zwislocki, 1948) The cross sectional area is relatively large near 
the oval window and decreases to about onc*fiflh of that value toward the 
hehcolrema The functional relationship can be desenbed by the equation 


5^ + 5, = 5 X 10-*e-®-^, 


(86) 


which produces the straight line in F/g 10 The approximation is not ^e^y 
accurate, but the events m the cochlea depend only on the square root of 
the cross-sectional area, so that the efTective error remains small The area 
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Fig 10 Cross sectional area of the scalae vestibuli and tympani as a function of the 
distance ftotn the oval window The points indicate two senes of measurements, the 
straight line is a mathematical approximation 


S of E(\ 85 follows ftom S = + Sj) or, acceptvag that to the first 

order of approximation 5, ^ S' = {S^ + .S,)/4 With the help of Eq 86 
we obtain 

5 = 1 25 X lO-V* = 5oe-“ (87) 

The resistance which is due to the viscosity of the fluid, can be 
derived from equations of fluid motion m circular pipes Unfortunately, 
the complete derivation is rather long, and we shall ^ve to content our- 
selves with the final results (Kinsler & Frey, 1950, Olson, 1957) For 
conditio ns enc ountered in the cochlea, that is, for a tube of radius 
a > loVWpco, where ft is the coefficient of viscosity, the resistance per 
unit length amounts to 

p _ 

a 


( 88 ) 
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which can be expressed m terms of cross-sectional area by introducing 
S = 77-a®' 

m 

Maintaining the approximation S,^S„ which leads to vie 

obtain from the definition of Rf in Eq. 85 


_ R„S, + R„S. .SaR„S, _ „ _ „ „„ 

S,S. =2 S/ W 

Introducing S* = + •S'|)/2 as the best approximation of the cross- 

sectional area, we obtain from Eq 89 




2iTfip<a 

(S, + S,)/2' 


( 91 ) 


The numerical value for the coeflBcient of viscosity was found by v. 
Bekesy (1942, 1960) to amount approximately to 2 10~*cgs units, the 
density of the perilymph is very nearly that of water, i e , p = 1 gram/cm*. 
With these values, and m agreement with £q 86, r ss 2 24 cgs units and 
a =» 0 5 cm"^. 

The remaimng parameter to be determined is the acoustic impedance 
of the cochlear partitioa Zp It can be expressed in general terms by the 
formula 

Z, = «,+2(o<M-b^], (92) 


where Kp, M, and C mean respectively the acoustic resistance, mass, and 
compliance per unit length It is possible to show that the mass M is 
negligible m those portions of the cochlea where the vibration amplitude is 
substantial (Ranke, 1942, Zwislocki, 1948). The comphance C can be 
derived from a static measurement performed by v. Bek6sy (1941) He 
loaded the cochlear partition over its entire length with a pressure equi- 
valent to a column of I cm of water and measured its displacement. The 
empirical results can be closely approximated by the function 

C=Coc^' (93) 

where Co = 4 x lO-^^cmVdyne and = 1.5 cm“*. 

The agreement between the empincal and theoretical values is shown 
in Fig 11. v. B6k6sy (1947, 1960) later performed another senes of 
experiments in which he measured the comphance of the basilar membrane 
and of other structures by pressing small hairs against their surface. 
Since in these experiments the distribution of forces was quite different 
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Fig 11 Acoujlic compliance pet unit length of the cochlear partition as a function of 
the distance from the oval window The points indicate v BiWsy’s measurements, and 
the curve is a mathematical approntmation 


from that produced by the inner car fluid, their results are inadequate for 
calculating acoustic events m the cochlea 
The compliance of the cochlear partition, as measured by means of 
fluid pressure, is low near the oval window and increases by a factor of 
over 100 toward the hehcotrema It leads to the characteristic wave 
pattern observed by v Bekesy (1928, 1942, 1943, 1947, 1960) 

There are no direct data for the acoustic resistance Up Consequently, 
Its numerical value must be adjusted so that the theoretical computations 
agree with v Bdkesy’s dynamic measurements We shall assume that the 
value of Kp is independent of a:, in agreement with v Bekesy’s (1943, I960) 
finding that the logarithmic decrement of transients m the cochlea is 
the same over the whole length of the partition, with the exception of the 
apical portion, that is, near the hehcotrema 
After introducing the expressions derived for the parameters J?,, -S', and 
Zpy Eq 85 becomes 


dx^ 


-jcop 

Soe-“(Rp^KllcoCo)e-f‘Y 


(94) 


Before Eq 94 can be solved analytically, it has to be simplified It is 
possible to deduce from v Bekesy’s experiments that Rp « IfcoC m those 
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portions of the cochlea where the amplitude of vibration is substantial 
Consequently, we can approximate the expression !/(/?,+ IjjcoC) by 
j(oC + and rewrite Eq 94 in the form 

SV -h j(op){ a)^C^R^ + jcoC) 

dx'^ “ s 


or regrouping the terms 

g = [d - CR,R,) (95) 

In this equation, CR^Rf 1 for all values of tu and z that are of interest, 
so that we can simplify it further to 


dx^ 



o)CRj, + 


RAlco^pCp 

wp/J S 


(96) 


The last equation can be solved rather easily if we are allowed to consider 
the expression in the bracket as independent of x (Zwislocki, 1946, 1948) 
This IS an acceptable approximation in view of the fact that the 
variable expression (coCRp -f R/(o>p) <K. I for all combinations of values 
of CO and z that produce a substantial vibration amplitude We can 
introduce, therefore, 

K = l- j(o>CR^ + ^J, 
and attempt to solve the simplified equation 

dz~ 5o ^ 


(97) 


(98) 


After some formal transformations, it is found that the solution can be 
written as a Hankel function of the zero order (Jahnke & Emde, 1945) 
Hankel functions belong to the class of Bessel functions, and they form 
complex conjugate pairs, and The function //“• converges to 
zero for infinite complex arguments when the imaginary part is positive, 
the function does the same for negative imaginary terms Since there 
seems to be no noticeable reflection at the apex of the cochlea and the 
wave amplitude decays to zero for large x, is the Hankel function 
that satisfies the boundary conditions Consequently, the appropriate 
solution of Eq 98 is 


p = A'Hi 


L (a + ftSl* 


(99) 
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In order to evaluate the argument of the Hankel function, \\c first investi- 
gate the complex term KK Using the definition given by Eq 97 and 
remembering that 

(mCR, + -^) «1 

\ n>p/ 

for the combinations of values of w and x that are of interest, ^ve can Vvntc 
to a close approximation 

1 - J - f «R,C.c" + -V (100) 

2\ <0 p / 

Equation 99 can now be written in the form 

p = A H'’>|Qra[l - 1 j, (101) 

With 


(a + 

It describes pressure waves that arc propagated from the oval window 
toward the helicotrema with a monotonvcally decreasing amplitude and 
wavelength (Zwislocki, 1948) This becomes more evident when Eq 101 
IS approximated by an exponential function (Jahnke & Emde, 1945) 

p = Acxp j^- - le ^ 

exp (102) 

The approximation holds for values of the argument greater than one, 
which, in fact, prevail m the cochlea 
The vibration pattern of the scala media is found by dividing Eq 102 by 
j(oZp (Zwislocki, 1948) We obtain 

n = — s «p _ iQwX 

JioZ, L 4 - » 

-IQ- exp [- (103) 

P J 

The last equation indicates that the displacement of the partition is ap- 
proxiTnately m phase with the pressure difference "between the scala 
vestibuh and scala tympani Its amplitude does not decay monotonically 
toward the helicotrema, however, but reaches a maximum at a location 
along the scala media which depends on the frequency This result 
agrees with v B6kesy’s (1943, 1947, 1960) visual observations The 






Fig 12 Amplitude distribution along the cochlear partition for an 800 cps tone 

theoretical amplitude distribution for an 800 cps tone is compared to that 
recorded by v Bekesy (1943, 1960) on one inner-ear specimen in Fig 12 
The agreement between the two curves with respect to both the location 
of the maximum and the shape lies within the range of individual vari- 
ability that can be inferred from v Bekfey's data 
The vibration maximum occurs where dr^ldx «= 0 Its locus is plotted in 
Fig 13 together with three senes of v Bek^sy’s (1942, 1943, 1947, I960) 
measurements A further validation of (he theory as well as of v Bekesy’s 
measurements on ear preparations is provided by the correlation between 
the frequency of maximum hearing loss and the locus of maximum damage 



Fig IS Location of the vibration jnawmum along the cochlear partition ai a function 
of sound frequency The points indicate two kinds of experimental data, the curve 
IS theoretical 




to the cochlear structures Evidence to this clTect was obtained by Crowe, 
Guild, and PoWogt(cif Wever, 1949) The position of the open rectangles 
in Fig 13 was inferred from their data The agreement between the two 
scries of empirical results and the theory appears quite satisfactory for the 
present stale of the art 

When calculating the locus of maxmium vibration at low frequencies, a 
small modification has to be made in the resistance Rf m order to obtain 
results that agree with those of v Bek^sy It seems that the resistance Rf 
does not vary with the square root of frequency, as indicated by Eqs 89 
and 91, but is somewhat less Since it plays a secondary role in the 
mechanical events taking place in the cochlea, it was simply assumed to be 
independent of frequency and equal to 

Rf = 124e“^'* (104) 

An intermediate formulation would have given a still better approximation 
to the empirical data 

There are three additional parameters of wave motion in the cochlea 
that are relevant to the analysis of auditory characteristics the phase of 
displacement, the phase (wave) velocity, and the input impedance of the 
cochlea 

The phase follows from the last exponential term of Eq 103, 

0 = (105) 

Phase relationships at the vibration maximum have been considered as a 
possible factor in the frequency analysis performed by the ear The phase 
at the locus of vibration maximum can be calculated by substituting for 
CO and X m Eq 105 pairs of values that satisfy the condition drjfdx s= 0 
The phase angle obtained m this way refers to the vibration of the cochlear 
partition at ic == 0 In order to compare the theoretical results with v 
Bekesy’s (1947, 1960) observations, it is necessary to use the phase of 
stapes displacement as a reference As will be explained later, the dis 
placement of the stapes lags behind the cochlear partition by -nfl atx = 0 
The phase angle at the locus of maximum vibration relative to the phase 
angle of the stapes motion is plotted m Fig 14 Closed circles indicate v 
Bekesy’s observations In view of experimental difficulties and the 
dependence of the phase on the numerical values of several parameters, 
the agreement must be considered satisfactory It is particularly note 
worthy that the theory as well as the experiments indicate a slightly in- 
creasing phase with frequency Although such an increase could easily be 
avoided by a moderate alteration of certain parameters, this fact m itself 
indicates that no particular analytical function should be assigned to the 
phase at the vibration maximum 
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Fig 14 Phase angle at the locus of maximum vibration along the cochlear partition 
as a function of sound frequency The phase angle is plotted xelauve to the phase angle 
of the stapes Points indicate values obtamed by v B^kdsy, the curve is theoretical 

The phase velocity is of relevance to localization experiments It can 
be calculated from the imaginary exponent in Eq 103 by introducing the 
time variable according to the definition p = It follows that 

= (106) 
and, after differentiation with respect to t while keeping the phase constant, 

0 = to - ^ (107) 

2 dt 

Since, by definition, dxldt is the phase velocity, we obtain by rewriting 
Eq 107, 

(108) 

M {« + m 

Note that, according to Eq 108, the phase velocity is independent of 
frequency, which agrees with model experiments of Tonndorf (1957) A 
more exact calculation would reveal some frequency dependence, but the 
effect IS very small except for the lowest frequencies The calculated 
numerical values are shown by the curve of Fig 15 They are in excellent 
agreement with the values derived from v BeJeesy’s (1947, I960) measure- 
ments, indicated by solid circles Physiological measurements on guinea 
pigs (Tasaki, Davis, & Legouix, 1952) are also in support of the theoretical 
curve 

The specific input impedance of the cochlea is by definition 



( 109 ) 
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Fig 15 Phase velocity of the transversal waves in the cochlea, plotted as a function of 
the distance from the oval window The points indicate values derived from v B^k^sy s 
experiments, the curve is theoretical 


In this formula, the pressure p follows from Eq 102 The velocity u 
along the cochlear partition has to be determined by using Eqs 79 and 80 
Neglecting the small resistive term and adding both equations so as to 
obtain the total pressure gradient, we obtain 

??=_2p- (110) 

dx dt 

on the assumption that S^ ^ S„ or, with p ~ and u = 

— = -j2pu)U (111) 

dx 

The pressure gradient follows from Eq 99 

^ = - 0 , ( 112 ) 

dx S(f 

or, since for argument values greater than one, 

^ ^ ii . P - ^0 e<«+/»^/2p (1 13) 

dx 

Wiih Eq 113, the expression for the velocity becomes 

( 114 ) 

2p 
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and that for the input impedance (x = 0), 


(115) 


Except for very low and very high frequencies, K^c 


; 1, so that 


11,000 cgs units 


(116) 


Thus, for most practical purposes, the specific input impedance of the 
cochlea is independent of frequency and is real It has a rather high 
numerical value 

The nature of the cochlear input impedance has important consequences 
for sound transmission in the ear First, it shows that sound pressure 
generated in the vicinity of the oval window is in phase with the velocity 
of the stapedial motion, and it precedes the stapedial displacement by 90'^ 
We have accordingly ^ 

X--, (117) 


so that 



(118) 


We note that the sound pressure across the cochlear partition increases in 
direct proportion to frequency when the amplitude of the stapes dis- 
placement IS kept constant 

From the point of view of the transmission characteristic of the ear, it is 
of interest to determine the maximum amplitude of vibration of the 
cochlear partition as a function of frequency This is particularly so when 
we assume that the vibration maximum controls the neural excitation m 
the vicinity of the threshold of audibility The desired relation may be 
found from Eq 103 by substituting for x the frequency expression that 
satisfies the condition drjidx = 0 By neglecting the insignificant contribu- 
tion of Rf, this expression can be written in the form 


X sa — ^ — {In (3d — a) — In [(ct 4* 3^)Qi?pCo] — 2 In tu} (119) 
a -I- 3d 

Substituted in Eq 103, it produces after elementary simplifications 

where B includes all constant terms except the pressure amplitude Now, 
applying Eq 118, we obtain 


i]ia»x — Xj 


( 121 ) 
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Fig 16 Magnitude of the maximum vibration of the cochlear partition for a constant 
amplitude of the stapes, plotted as a function of sound frequency. The points are experi- 
mental, the curve is theoretical. 

the maximum amplitude of the volume displacement of the cochlear 
partition for a constant displacement of the stapes. According to Eq. 121 » 
this amplitude increases with the 0.6 power of frequency. Figure 16 
demonstrates that v. Bek^sy (1943, 1960) obtained a similar result by 
direct observation. 

The preceding analysis shows that we are able to deal reasonably well 
with the mechanical events in the cochlea. Given a specified input, for 
instance, the displacement amplitude of the stapes, it is possible to calculate 
the vibration of the cochlear partition, which probably constitutes the 
direct mechanical stimulus for the sensory cells. We now shall attempt to 
calculate the motion of the stapes for a given sound pressure at the eardrum. 


2.3 Sound Transmission in the Middle Ear 


The mam acoustic function of the middle ear appears to be to adapt the 
high input impedance of the cochlea to the low specific impedance of air. 
n this way the transfer of acoustic energy is improved (Helmholtz, at. 

Wever & Lawrence, 1954; v.Bekesy, 1941). 

The average acoustic energy per unit volume of a plane wave amounts to 


PO 


( 122 ) 


where P is the pressure aiuphtudc and 0 the amplitude of particle velocity, 
n most acoustic problems, the quantity introduced is the average rate of 
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energy flow through a unit area normal to the direction of wave pro- 
pagation Such energy flow is called intensity and is defined by the formula 


2 ’ 


(123) 


which, because of Eq 62, can also be written in the form 


I 


11 

Ipc 


(124) 


When a plane wave encounters a discontinuity where the impedance 
changes from Zi to Z 2 , part of its energy is reflected and the ratio of 
sound pressure of the reflected and incident waves follows from Eq 64 
Squaring this equation, we obtain the ratio of the reflected and incident 
intensities 


/, 

/. 


The transmitted intensity is 




(Z. - 

IZa + Zil 

(125) 

j 4ZJZr 
‘(t + Z*/Z0' 

(126) 


It IS equal to the incident ener^ when and decreases as the 

impedance ratio either increases or decreases Ideally, the specific imped- 
ance measured at the eardrum should be equal to the specific impedance 
of air for plane waves 

The impedance transformation m the middle ear is achieved by a high 
ratio between the surface areas of the eardrum and of the oval window and 
by a lever ratio of the ossicular chain The situation is schematized m 
Fig 17 where and indicate the sound pressure and the area of the 
eardrum, respectively, and the corresponding parameters of the 
oval window, /„, the lever of the malleus, and /<, the lever of the incus 
The system is in equilibrium when the acting moments are in balance, that 
IS, when 

P^A^L = PM (127) 

The average velocity of motion of the eardrum is related to the average 
velocity of the stapes by the equation 



As a consequence, the ratio of impedances amounts to 

z, Oi p. 


( 129 ) 
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Schematic representation of the mechanical transformer of the middle ear The 
thTea H * indicate the sound pressure at the eardrum and the effective area of 

entrance to”hemchler '' parameters at the 


According to v Bekdsy and Rosenblith (1951), the effective area of the 

whiehT to 0 5Scm^ the effective area of the oval window, 

tht lLer ^r'r footplate, 0 032cm», and 

the lever ratio IJl, = 1 3 With these numerical values 

j = 00345, 

valuf ofTdnn‘”‘’'‘‘““ ^^“^ooed from the theoretical 

Itt 41 tif ^ T r * The specific impedance of air is 

oven in this idealized situation 40% of 
lime bv the ° f^om the eardrum This 

In re!ft for au acoustic system 

own imnedLce tl!'Ih T “mplex The middle ear adds its 

through imperfect cXlmg ' “ transmission loss 

osL^uIar° har’lndT^'^L'^"'*' “t are the eardrum, the 

plS the air m the™ 'h' eardrum is dis- 

impedanee of the cavite'affKtnhe'rd ” 

was measured by Onch tlS L ^his impedance 

was later corrected tn fit ^ specimen of temporal bone, and 

motion of .h7eardrum fs T”®' (Zwisloeki, 1962) The 

chain, but only incomnleteIv*° >mpatled to the malleus of the ossicular 
Of the eardrum Thus^ there’ isrslmh^ J^bsorbed by the flexibility 

and the malleus Thf^ Inn, * energy leak between the eardrum 

so that no enar J iLs shnr"" 

• The mtio bet" ‘'™ 

cochlea and the oval wlLo^w dKreSa ihL'^M “"'°n at ihe cnlrance 10 the 
cochlear impedance by a factor of two 
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ossicles However, some energy becomes absorbed by the ligaments and 
muscles holding the ossicular chain in place A further energy leak occurs 
at the mcudo-stapedial joint which appears to provide a considerable 
freedom of relative motion between the incus and the stapes (Onchi, 1949, 
1961, Zwislock], 1957, 1962) 

On the basis of the anatomy (Fig 6) and of the consideration of energy 
flow, the middle ear system may be represented by the block diagram of 
Fig 18 (Zwislocki, 1962) With the help of electroacoustic and electro- 
mechanical analogies, the blocks may be considered as representing 
electncal networks with the impedances Z^, Zg, etc Then the input 
impedance of the system, which is equivalent to the acoustic impedance at 
the eardrum, follows from the equation 

ZjZgZj + Z1Z2Z5 -p Z1Z3Z4 + ZjZsZs 

Z = + Z,Z.Z5 + _ z^z,z,± ^,^+ Z ,ZA 

Z3Z4 + Z2Z5 Z3Z4 + ZgZs + ZjZs 

It IS not difficult to see that an analysis of the middle ear function by 
means of Eq 130 is rather tedious, especially since each of the Z terms is a 
complex expression when written in terms of the circuit elements It is 
easier to build an analog network and to match its characteristics 
to those of the middle ear Figure 19 shows the result of such an analysis 
(Zwislocki, 1962) By comparing the groups of elements in Fig 19 to the 
position of the blocks in Fig 18, it is possible to see which parts of the 
middle ear they represent The numerical values of the elements have 
been determined on the basis of anatomical measurements and of im- 
pedance measurements on normal and pathological ears It may be of 
interest to note that some pathologies, like otosclerosis or interrupted 



Pig 18 Block diagram of the middle ear system Adapted with permission from 
ZsMslocki (1962,p 1515) 
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analog of the middle ear Adapted with permission from Zwislocki 


“ substantial simplification of the network 

(Zwislocki, 1957, 1962) 

* compare the input impedance of the analog of Fig 19 
measured at the eardrum of normal ears in several experimental 
Fmn r,ji eange of the experimental data 

fremJl * °*’*"vations indicate that the slightly increased resistance at low 
mTm? ‘‘“0 to an artifact in measurements The relative 

of Mn r. ■“ *''' ““'"S characteristics m the vicinity 

Itern TJ. Tr “<* antiresonances of the system A similar 

S ,s an , a measurements on ear preparations 

ears MorTonn ‘".“'"u determined on normal 

maLma It’ ^ " u'’' ^‘8^ 20n and 6, the 

™"os somewhat from person to 
am notkn Jl soothed out The impedance values above 2000 cps 
scries (Morton A i' have been measured in only one experimental 

mcr:aslt;ttatoI:To6otf’c^^^^^^ 

test the ^n{lln(T -It L.rvL r ^ it Will not be possible to 

The analog agrees with thetdt empirical data are gathered 

to the numtieal laTIls If alTits It?"' f -“h respect 

highly probable, therefore thlt tTe'" i ' " 

closely approximates ihf,t ’r" * analog transmission characteristic 
expreLd by rvoTume d f" ‘"’'"S' “c ^h'S characteristic, 

pressure at the eardrum, is'p’JotS’m rI 2 ? 

human bemgs^^so^hauliVc transmission characteristic on live 

ngs, so that the curve of Fig 21 is probably the best available 
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estimate Both v Bekesy (1942) and Onchi (1961) determined the trans- 
mission characteristic on ear preparations However, more recent com- 
parative impedance measurements have revealed that the mechanical 
properties of the middle ear change considerably after death The post- 
mortem impedance at the eardrum is several times larger than in a living 
subject (Zwislocki & Feldman, 1963) Accordingly, v Bekesy’s and 
Onchi’s data indicate a transmission loss that is one order of magnitude 




Fig 20(a) Acomtic reactance at ihe eardrum 'Hie pomu indicate the average ^a1ue^ 
obtained in several experimental senes, tbecunc was obtained on the neivvork analog 
Adapted with permission from Zwulocki (1962 p 1520) (ft) Acoustic resistance at the 
eardrum The points indicate the average \-aluo obuined m several espenmental senes, 
the curve was obtained on the netviork analog Adapted with permission from Zwislocki 
(1962, p 1520) 
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Fig 2 1 Transmission characteristic of the middle ear, that is the volume displacement 
of the stapes for a constant sound pressure at the eardrum plotted as a function of sound 
frequency The crosses indicate data obtained on ear preparations, the curve was 
obtained on the middle ear analog 


larger than m the analog Nevertheless, the shape of the transmission 
characteristic seems to remain reasonably well preserved, as indicated by 
crosses m Fig 21 Their position has been derived from Onchi’s measure- 
ments 

In order to be able to compute the transmission characteristic of the 
whole ear, two more steps are necessary 


2 4 Transmission Characteristic of the Ear 

The complete transmission characteristic of the ear can be obtained by 
adding the pressure transformation m the ear canal and the pressure 
change produced by sound diffraction at the head to the already calculated 
transmission characteristic of the cochlea and the middle ear 
The ear canal can be regarded as acoustically equivalent to a rigid tube 
of approximately 2 5 cm length and 0 7 cm diameter Because of wave 
reflection at the eardrum, the total pressure at any distance x from the 
eardrum follows from Eq 69 

+ (131) 

Here ptx denotes the sound pressure of the incident wave and 

*r 

where Z, IS the acoustic impedance of the ear canal in the absence of 
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Fig 22 Network analog of the ear canal 

reflections and is the acoustic impedance at the eardrum At the 
eardrum, the sound pressure amounts to 

Po = PM+fie^% (133) 

so that the pressure ratio between the eardrum and the entrance to the 
ear canal is 

B = Pi« .,34^ 

or, since by kl, 

£2 _ 1 + ^g^^_ _/« 

p, l+/Se’"’-“‘> ^ ^ 

The pressure transformation can either be calculated from Eq 135 or 
determined on an analog network For the latter purpose the ear canal 
may be considered as a transmission line with a structure like that m 
Fig 22 The pressure transformation m the ear canal has been measured 
by Wiener and Ross (1946) using a probe microphone Since the probe 
might have altered somewhat the acoustic conditions in the ear canal and 
since the network model is only an approximation of the real conditions, 
it is comforting to find that the analog results agree reasonably well with 
Wiener’s and Ross’s data, as shown m Fig 23 



1000 10 000 

Freauency In cycles per second 

Fig 23 Sound pressure at the eardrum for a constant sound pressure at the entrance 
to the ear canal The solid curve indicates average experimental results, the intermittent 
curve was obtamed by means of the car analogs 
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Fig 24 Sound pressure at the entrance to the ear canal for a constant pressure measured 
m a free field at the location of the center of a listener’s head Experimental data was 
obtained by Wiener and Ross Adapted with permission from Wiener & Ross (1946, 
p 407) 

We shall not concern ourselves here with a calculation of sound dif- 
fraction at the listener’s head The resulting change in sound pressure can 
be measured without great difficulty, and Fig 24 shows the corresponding 
data obtained by Wiener and Ross (1946) 

The total transmission characteristic of the ear is a simple decibel sum 
of the four component characteristics of Figs 16, 21, 23, and 24 (see also 
Zwislocki, 195&) U IS shown in a slightly modified foTm in Sec 4 V/ith 
Its help It is possible to transfer approximately the stimulus specification 
from the free field to the level of sensory cells This transformation 
together with certain additional chaiactenstics of the cochlea will help us 
to interpret some of the known psychoacoustic functions 


3 PITCH FUNCTION 

Pitch IS a psychological attribute of sound It depends mainly on sound 
frequency, but it also varies with sound intensity and stimulus duration 
The pitch produced by complex sounds does not necessarily correspond to 
a spectral frequency component, but it may depend on the periodicity of 
the envelope (Schouton, 1940) For such conditions, the term “penodicity 
pitch” has been proposed In this section, attention is focused on the 
classical concept of pitch, that is, pitch that is associated with pure tones 
In cNcrydaj life, we order pitch on an ordinal scale from low to high 
Stcicns, Volkmann, and Ncivman (1937) succeeded in constructing a 
TiUTncTical interval scale by means of a method of fractionation They 
presented alternately tones of fixed and lartable frequency, and asked 
the observers to adjust the \ariable stimulus until its pitch appeared 
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to be half the pitch of the fixed reference stimulus They called the unit 
of pitch a “mel” and defined it by assigning a pitch of 1000 mels to a tone 
of 1000 cps at a loudness level of 60 db Subsequently, Stevens and 
Volkmann (1940) revised the onginat mel scale on the basis of fraction- 
ation and equisection experiments The revised mel scale is plotted m Fig 
25 

More recently, Stevens and Galanter (1957) have shown that, under 
certain precautions, methods of magnitude estimation and category 
judgments yield results that arc consistent with the mel scale Their 
magnitude estimation data are indicated by open circles m Fig 25 A 
similar conclusion was reached by Beck and Shaw (1962), who found that, 
depending on the reference standard, magnitude estimation can agree 
either with the original or with the revised mel scale 

Thus, within the range determined by the data of Fig 25, the pitch 
scale appears to be fairly well established Its vahdity is strengthened by 
the simple relationship it bears to frequency jod’s, critical bands, and other 
auditory parameters (Lzcklider, 1951), as well as to anatomical and 
physiological findings (Stevens & Davis, 1938, Wever, 1949) 

Out of a considerable number of jnd studies, those of Shower and 
Biddulph (1931) and of Zwicker (1952) are probably the most extensive. 
In both, pitch variation was produced by means of frequency modulation 



Fig 25 Pitch as a function of sound frequency for a loudness level of CO db The solid 
hne shows results of fractionation and equisection, the circles arc those of magnitude 
cstimauon, and the mtenniucnt line resulted from an integration of frequency jnds. 
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The simplest frequency modulated tone may be described by the 
equation 


> = P cos ^27r/„( — ^ COS 


(136) 


where p is the sound pressure, P its amplitude, the center frequency, 
fm frequency of modulation, and 2Sf the frequency range. At slow 
modulation rates, a periodically varying pitch is perceived, at higher 
modulation rates, a steady, complex sound. The latter effect is produced 
as a result of spectrum analysis in the auditory system. When a frequency 
modulated signal is transmitted through a bank of narrow band filters, 
several fixed frequencies instead of a varying frequency can be detected 

attheoutput For moderate dyand sufficiently high/„, such that « 1, 

three predominating spectral components arise /„, the center component, 

“wl™ CA respectively. 

When the difference between the speetral frequencies, that is, 2/„, is 
sumcient, the auditory system behaves in some ways like a bank of filters, 
and It resolves the frequency modulated signal into its steady-state 
ajust noticeable frequency difference 
determined, although its magnitude is expressed as the ampli- 
tude of the sidebands This amplitude is equal to 6fj2f„ 
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Since, at low modulation rates, 23/ defines a frequency range and, at 
high modulation rates, 2/„ does the same, both can be used as diiferent 
measures of the resolving power of the auditoiy system First, Fig 26 
shows two series of jnd values plotted in terms of 26/// against center 
frequency The Shower and Biddulph data were obtained at a sensation 
level of 60 db with a modulation rate /« = 2 cps At this modulation 
rate the jnd reaches a minimum Zwicker’s data were obtained at a sound 
pressure level of 80 db and a modulation rate of 4 cps Their higher nu- 
mencal values are probably due to the higher modulation rate Otherwise, 
they appear to be consistent with those of Shower and Biddulph, and they 
show approximately the same functional relationship to the center 
frequency At low frequencies, the absolute jnd appears to be approxi- 
mately constant, at high frequencies, it is roughly proportional to 
frequency, and more exactly to its 3/2 power Zwicker’s results can be 
described quite accurately by the equation 

26/^ a/5^ + b, (137) 

or 

2^=a/'-^+br\ (138) 

where ct =* 9 10“® cps^ and 6 *= 3 cps 
Equation 137 is characteristic not only for frequency but also for 
differential sensitivity in general Its constant term indicates an absolute 
sensitivity maximum that persists up to a certain stimulus level At 
higher levels, the sensitivity decreases, indicating a self-adapting system, 
that IS, a system whose charactenstics are controlled to a certain extent by 
the input signal A stimulus-dependent sensitivity is essential m handling 
VATO/i sa typical of sensory orgpns 

The independent determination of the pitch scale and of the frequency 
jnd’s provides an opportunity to test Fechner’s hypothesis of the equal 
psychological magnitude ofjust noticeable increments It is only necessary 
to compare the size of the jnd’s to that of (he mel steps For this purpose, 
either the pitch function of Fig 25 can be differentiated with respect to 
sound frequency or the jnd function of Fig 26 can be integrated Let us 
follow the classical procedure and integrate the jnd’s (Stevens & Da\is, 
1938) 

If A/ = 26/ denotes the physical size of the jnd, then n » I/A/ is the 
number of jnd steps per cycle There are 
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steps between the frequency limits /„/j Using Eq 137, we obtain 

(140) 

•'/I af'^ + b 




The integration can be performed graphically, or analytically by approxi- 
mation In the first instance, it is convenient to plot //A/ against In /and 
measure the area bounded by and the //A/ curve That this is 
equivalent to the integral, Eq 139, results from the following consideration 

d(tnf)=jdf. 

consequently, 

nf rf(ln f) = ndf 

In order to perform an analytical integration, we note that, for numerical 
values of Fig 26, the equation 

A/ =*/+/. (141) 

** P rij* 1 = 25 cps, describes the jnd curve almost as well 

as bq 137 Thus 


f=r’-^= 

•'/i kf + / 


:il„4(k±.' 
^ + I 


(142) 

of^equc^cyis^^^^^^'^^ at/= 0, the number ofjnd's as a function 
W — 1 1 ^fi ^ 1 

~ “ ~ = 25 X 10“ In (1 6 X 10-y+ 1) (143) 

Is" ThTresuir'"" ^ or " = are plotted m Fig 
leadJa to reco ^1 exactly with the mel scale, 

ragnifud”efre'r;e™hc^^^^^^ 

1 jnd ^45 mels 

jlTde'tersc'; between the 

although the abTotr ’'“'‘*^='PP”»'■^■ately for all seLation levels, 
ASgh the mvct,r if ^ "'""'S” & Davis, 1938) 

siderable theoretical interfsrits imDort'"“° of frequency jnd’s is of con- 
since the rule does not hold Cnr u ' P°'4'‘”“ should not be overestimated 
We next consider fre^ ddferential sensitivity along other continua 

ehanger“„'r;o" ^^“™rv:dt„rhe“' 

pc d and the sensation becomes that of a 
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teady complex sound. For this purpose, Eq 136 can be expressed in terms 
if spectral components 

p = P cos 2 ir/„t + - -^sin 2 n{/„ + /„)( + ■ 27 ^®"' 

^Wicker (1952) measured the just noticeable amplitude ratio between the 
idebands and the carrier for various sensation levels and earner fre- 
luencies Figure 27 shows his results for/„ = 1000 cps He also measured 
he just noticeable amphtude ratio for an amplitude-modulated tone as a 
function of modulation frequency and found that, at low frequencies, 
he auditory system is much more sensitive to amplitude modulation than 
to frequency modulation At high modulation frequencies, the d'fferetice 
vanished, however (Fig 27) This is of considerable theoretical interest 
since, for small modulation amplitudes, the spectrum of an amplitude 
modulated oscillation is the same as that of a frequency -"o" one 
except for the phase difference between the sidebands and earner 
In a frequency modulated oscillation, the phase difference amount to 

w/2, in an amplitude modulated oscillation. It IS zero The spectrum o 

the latter can be expressed as follows 
p = P cos 2n-/„( -I- 1 m cos 2n(/o -F /„)< -F | m cos 2n(/. - fjl (145) 
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Since the spectral phase relation is the only differentiating parameter 
between the two kinds of oscillation, Zwicker concluded that the ear is 
phase sensitive for low modulation frequencies, that is, when the fre- 
quency difference between the sidebands is small Beyond a certain 
critical spectrum width, the phase sensitivity disappears A further 
investigation revealed that the critical spectrum width depends on the 
earner frequency in the same way as do the frequencyjnd’s at low modula- 
tion rates and the size of mel steps This means that the critical spectrum 
width is equal to a constant number of mels It is possible to deduce 
from Zwicker’s data that, for 1000 cps tones, this number is approximately 
115 mels, independent of sensation level 


The critical spectrum width manifests itself in several kinds of auditory 
experiments and may be regarded as a fundamental concept in auditory 
theory (Feldkeller &. Zwicker, 1956, Zwicker, Flottorp, & Stevens, 1957) 
The concept was introduced previously by Fletcher (1940) and called 
critical band The absolute size of Fletcher’s critical band resulted from 
postulated rather than from empirical data, although a direct measurement 
was attempted For this reason, Zwicker, Flottorp, and Stevens (1957) 
suggested that it be called the critical ratio and that the name critical band 
be transformed to the more recently modified concept Fletcher determined 
the size of the critical ratio as a function of sound frequency, and he 
found a similar relationship to that for the critical band As a consequence, 
the quotient of the critical band and the critical ratio is practically in- 
dependent of frequency, it amounts to approximately 2 5 
The similarity between pitch, frequency jnd, and critical band functions 
suggests the hypothesis that all three have a common physiological sub- 
strate Helmholtz (1863) postulated that pitch is determined by the 
maximally excited nerve fibers and that this excitation is controlled by the 
locus of maximum vibration amplitude in the cochlea The mechanism he 
invoked to explain the occurrence of such a maximum may have been 
inaccurate but v Bekdsy (1942, 1943) has demonstrated that the maximum 
nmZ. frequency m the way Helmholtz 

Wb fr,n It IS situated near the hehcotrema, for 

T'"’ “1“'° *'*' t’™' Theories linking pitch 

Soml r “ •''= 'ot^Wnnr canals are now called place theories 

theorv postulates that may be proposed with respect to a place 

vibraTion m directly proportional to the distance between the 

md” andXTb helicotrema and (2) the size of mel steps, 

distance with r ^ inversely proportional to the derivative of the 

oistance with respect to sound frequency 

vibration nnrl*^^ functional relationship between the locus of maximum 
vibration and sound frequency was denved theoretically on the basis of the 
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Obtained relation is consistent with empmeal measurements An exnres- 
sion was found that determines the amplitude of vibration r, of the 

® * ''™“ ‘he oval window 

teq illjj The maximum amplitude occurs where the distance denvative 
form™”'*^^*’ ~ ° condition leads to an expression of the 


Ain 


(b^ — 4ac/^)‘'^ — b 

2ar 


(146) 


where A, a, b, and c are positive constants depending on the physical 
parameters of the cochlea In order to obtain the distance from the 
hehcotrema, we introduce l — x, with / indicating the length of 
cochlear canals Thus 


In - 


2ar 


-b 


(147) 


The first postulate IS satisfied jf Eq 147 is identical with Eq 143, which 
detcnnines the number of jnd steps as a function of frequency Clearly, 
this IS not the case. The numerical difference between the mel scale and the 
function may be denved from Fig 28 We must conclude that pitch is 
not directly proportional to the distance of the locus of maximum oscillation 
from the hehcotrema Still, the curves of Fig 28 are sufiiciently similar to 
entertain a hypothesis of close correlation between the two vanables In 
order to clanfy the situation somewhat, we may attempt to determine the 
size of jnd’s or mel steps in terms of length intervals along the cochlear 



Fjg 28 Pitch and the locacian of the vibration maximum along the cochlear paruuon 
ar a function of sound frequenev 
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Since the spectral phase relation is the only differentiating parameter 
between the two kinds of oscillation, Zwicker concluded that the ear is 
phase sensitive for low modulation frequencies, that is, when the fre- 
quency difference between the sidebands is small Beyond a certain 
critical spectrum width, the phase sensitivity disappears A further 
investigation revealed that the critical spectrum width depends on the 
carrier frequency in the same way as do the frequency jnd’s at low modula- 
tion rates and the size of mel steps This means that the cntical spectrum 
width is equal to a constant number of mels It is possible to deduce 
from Zwicker’s data that, for 1000 cps tones, this number is approximately 
115 mels, independent of sensation level 


The critical spectrum width manifests itself in several kinds of auditory 
experiments and may be regarded as a fundamental concept in auditory 
theory (Feldkeller & Zwicker, 1956, Zwicker, Flottorp, & Stevens, 1957) 
The concept was introduced previously by Fletcher (1940) and called 
miical band The absolute size of Fletcher’s critical band resulted from 
postulated rather than from empirical data, although a direct measurement 
was attempted For this reason, Zwicker, Flottorp, and Stevens (1957) 
suggested that it be called the critical ratio and that the name critical band 
be transformed to the more recently modified concept Fletcher determined 
the size of the critical ratio as a function of sound frequency, and he 
found a similar relationship to that for the cntical band As a consequence, 
e quotient of the critical band and the critical ratio is practically in- 
dependent of frequency, it amounts to approximately 2 5 
The similarity between pitch, frequency jnd, and critical band functions 
suggests t e hypothesis that all three have a common physiological sub- 
strate Helmholtz (1863) postulated that pitch is determined by the 
maximally excited nerve fibers and that this excitation is controlled by the 
vibration amplitude in the cochlea The mechanism he 
owon-ence of such a maximum may have been 
f L has demonstrated that the maximum 
orernea H i"' 1°“' m the way Helmholtz 

Lh fren, It IS situated near the helicotrema, for 

to kicltmn r'’ 'V''' Theories linking pitch 

Som Tmte"® ? 'A- 

theorv are m "toy he proposed with respect to a place 

vibrafion mV d‘reotly proportional to the distance between the 

jndrandXTu "a"'* -nl (2) ‘he size of mel steps, 

instance with 
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anatomy and the physical constants of the cochlea It was shown that the 
obtained relation is consistent with empirical measurements An expres- 
sion was found that determines the amplitude of vibration i; of the 
cochlear partition as a function of the distance * from the oval window 
(Eg 103) The maximum amplitude occurs where the distance derivative 
of ri vanishes, drijdx =s 0 This condition leads to an expression of the 
form 


X = Ain 


(6^ - 4ac/^)^ - 6 
lap 


(146) 


where A, a, b, and c are positive constants depending on the physical 
parameters of the cochlea In order to obtain the distance from the 
hehcotrema, we introduce = with I indicating the length of 

cochlear canals Thus 




l + Aln 


lap 

(b^-4acpp~b 


(147) 


The first postulate js satisfied if 147 is identical with Bq 143, which 
determines the number of jnd steps as a function of frequency Clearly, 
this IS not the case The numencal difference between the mel scale and the 
function may be derived from Fig 2t We must conclude that pitch is 
not directly proportional to the distance of the locus of maximum oscillation 
from the hehcotrema Still, the curves of Fjg 28 are sufficiently similar to 
entertain a hypothesis of dose correlation between the two variables In 
order to clarify the situation somewhat, we may attempt to determine the 
size of jnd's or mel steps in terms of length intervals along the cochlear 



F>g 28 Pitch and the location of the vibration maximum along the cochlear partition 
as a function of sound frequency 
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Fig 29 Relative frequency jnd and the relative frequency change for a constant shift 
of the vibration maximum along the cochlear partition and for a shift of the maximum 
by a constant number of ganglion cells 


partition For this purpose, we take the derivative of Eq 147 with respect 
to/ ^ 

^ = !«££ 1 1 (148) 

df I2ap 2(b^ + 4ac/)‘-^ + Aaep)"'^ - hj ^ 

This determines the distance that the maximum of vibration is shifted 
along the cochlear partition when the sound frequency changes by one 
cycle By taking l/Ctfe^/d/), which is equivalent to dffdx^, we obtain the 
frequency change per unit distance The quotient divided by the sound 
frequency and multiplied by 4 5 10-»cm is plotted in Fig 29 together 
with the relative jnd, A///' It is apparent that at frequencies above 500 ops 

A/ 1 df 

- 7 = 7 : 7 ^x 45 x 10 -’, (149) 

f fdx^ 

so that, letting df= A/, we obtain 


Aa:, = 4 5 X 10-’ cm, (150) 

critiMrbnnd °"= J"<* Accnrdmgly, the jnd, pitch, and 

ments linn approximately proportional to distance incre- 

Sw I partition, and the second postulate is satisfied 

relationshin a< 7 a"° proportionality exists, however, so that the 

comcidentAl u higher frequencies must be considered 

coincidental rather than of basic significance 

more significant relation is found when the distnbution of the 
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stimulus IS plotted m terms of the distribution of nerve endings rather than 
in terms of distances from the helicotrema Such a procedure seems 
justified since, in the end, the frequency information must be transmitted 
by the nervous system 

The sensory cells in the cochlea are innervated by approximately 30,000 
neurons which have their cell bodies in the ganglion spiralis Wever (1949) 
made a thorough count of these ganglion cells, determining their density 
per millimeter of the basilar membrane In the basal and middle turns of 
the cochlea, the neural density appears to be nearly constant and amounts 
to approximately 11 50 ganglion cells per millimeter In the apical turn, 
the density decreases toward the helicotrema according to a monotonic 
function This function may be expressed in the form 

y = (151) 

With <7* denoting the local innervation density and the average inner 
vation density in the basal and middle turns If now the expression 
4 5 10*3 j 5 divided by y, the dotted curve m Fig 29 results 
This curve agrees considerably better with the relative jnd curve than does 
theplotoftheexpression(l/0(<i^/<^-^A) The remaining difference 

can easily be accounted for by the errors of the psychophysical as well as of 
the anatomical and physiological measurements We conclude therefore 
that equal distances on the pitch scale correspond to equal numbers of 
peripheral neurons More specifically 

I mel = 12 neurons, 

1 jnd = 52 neurons, 

1 CB = 1300 neurons, 

where CB denotes critical band 

This conclusion is encouraging to those who attempt to relate the 
subjective scales to innate properties of the organism rather than view 
them entirely as a result of learning 

In order to visualize what a critical band means in terms of the vibration 
pattern m the cochlea, consult Fig 12 The relative position of the two 
curves that indicate the amplitude distribution along the cochlear partition 
corresponds to two critical bands A distance equivalent to one jnd is 
about fifty times smaller Such sensitivity is astounding, and it is difficult 
to conceive how it arises without assuming an additional mechanism that 
emphasizes the amplitude differences Such a mechanism is discussed in 
Sec 5 
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4 THRESHOLD OF AUDIBILITY 


The concept of threshold is a controversial matter in psychophysics 
It has even been suggested that, m a strict sense, psychophysical thresholds 
do not exist (see Chapter 3, Vol 1) Nevertheless, it is possible to define a 
threshold as a stimulus intensity that, under given conditions, produces a 
certain probability of signal detection A strong auditory stimulus can be 
^tected practically 100 % of the time when no interfering noise is present 
When Its intensity is decreased, the probability of detection becomes 
gradually smaller and finally approaches zero Conventionally, the 
threshold intensity is defined as the intensity that produces a 50% chance 
ot detection This intensity may vary from situation to situation, and 
we s a analyze some of the physical and physiological parameters 
higher order psychological factors are discussed in Chapter 


audibility, as previously defined, depends on the 
eonon ‘'i® "'"gy distribution along the frequency 

annlTr, '■''«'°"s''ip depends in part on the first, it 

appears reasonable to discuss them in that order 


^ Stlm^us Patt^^n^^^**'*^ ^ Function of Temporal 

^ short tone burst is increased, its threshold of 
This IS true ^PP''n*'*n3terateof3db per doubling of duration 

threshold ni^ of approximately 200 msec at which point the 

A dlr^ f approaches an asymptote ^ 

durations This ^ occur at extremely short 

spectrum (Garnet 1947^ An changes m the frequency 

line but a inns, u u ^finitely long tone has only one spectral 
f^queoVra™ portiL occupies a 

A/=-, 

T 

IS distributed ovcVthc^wh^ole" conditions, the sound energy 

lowered dcpcndinE on ihi. and the threshold may be raised or 

frequency Garner f 19471 '^^’''“Uon of the auditory system with 

in detail from difierent nn ^‘^*“>•1(1961) have analyzed the phenomenon 

considerations where U is of t neglect it in the following 

ii IS ot secondaiy importance 
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Empinca! dita obtained in three independent studies for a 1000 cps 
tone are shown in Fig 30 (Garner, 1947, Feldtkeller & Octinger, 1956; 
J. Zwislocki, I960) For durations greater than 10 msec, they can be 
approximated quite well by a curve obeying the equation 


I, 


I — e ‘ 


(152) 


where /, denotes the threshold intensity of the burst, 7„, the threshold 
intensity of a continuous tone, and /, the burst duration The equation 
describes a simple energy integrator with a time constant 1/a = 200 msec 
(Munson. 1947, Feldtkeller & Oetingcr, 19S6) As a consequence, we 
assume such an integrator as a model and investigate how well it predicts 
the threshold of audibility for various time patterns of the stimulus 
Other, probabilistic, models have been suggested, but their validity was 
not tested beyond the function of threshold intensity versus duration 
(Gamer & Miller, 1947, Green, Birdsall, & Tanner, 1957) 

Physiological studies suggest that, from the point of view of the pe- 
ripheral nervous system, each wave of sinusoidal vibration may be con- 
sidered as an independent stimulus Thus, when the duration of a tone is 
varied, the number of elemental stimuli is varied simultaneously These 
variables may be separated by using pulse sequences and changing the 
number of pulses while keeping the duration of the sequence constant, or 
vice versa (Zwislocki, 1960) A drawback m such experiments is the 
spectral distribution of energy The major portion of the spectrum of a 
pulse that IS r sec long is included between 0 and I/rcps The pulse 
repetition rate determines the density of spectral components The 
difficulty may be overcome by high repet,, 

tion rates (Zwislocki, Heilman, & Verrilb. 1962) This is particularly true 
when the pulses are transmitted through a selective network emphasixing 
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Fig 31 Threshold of audibility of pulse pairs as a function of time interval between the 
pulses The points arc experimental, the curve is theoretical Adapted with permission 
from Zwislocki (1960 p 1047) 

a relatively narrow band of frequencies The same goal may be achieved 
by using sequences of short tone bursts when the tone frequency is 
sufficiently high (Scholl, 1961) 

In one experiment the threshold of pairs of pulses was measured as a 
function of time interval between the pulses (Zwislocki, 1960) The closed 
circles m Fig 3 1 show the experimental data They are fitted quite well by 
the curve drawn m the same figure, which obeys the equation 

r5= lOlogd (153) 

This equation can be rewntten m the form 

A = (154) 

m which /i denotes the threshold intensity of a single pulse, A the 
threshold intensity of the pulse pair, and / the time interval between 
the pulses In this form, it suggests a simple neural mechamsm which may 
be described m the following way (Zwislocki, i960) 

Certain portions of the nervous system function m a quantal manner and 
produce the so-called all or-none responses The response of other 
portions IS graded Axon cybnders exhibit the first kind of response, 
dendntes, synaptic junctions, and muscle endplates the second In the 
presence of a graded response, the resultant excitation appears to be a 
simple sum of elemental excitations produced by incoming stimuli After 
the stimulation has been terminated, the excitation decays approximately 

according to an exponential function 


V = »7oC~“ 


(155) 
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If we postulate the graded response of the nervous system to be the 
mechanism controlling the threshold of audibility as a function of the 
temporal stimulus pattern, we can wntc for the pulse pair 

= (156) 

where ?? is the total excitation immediately after the second pulse has 
occurred, cj the excitation produced by the first pulse, and €3 the excitation 
produced by the second pulse The equation describes correctly the 
measured threshold of pulse pairs when = cg and e = C/ These re- 
quirements should be in agreement with the known functioning of the 
auditory nervous system It is possible to show that they are 
Without much doubt, it should be possible to express the functional 
relationship between the excitation and the stimulus intensity I by means 
of a series expansion of the form 

€ ^ aj + -h -h (157) 

According to physiological evidence, the peripheral auditoiy system 
exhibits considerable spontaneous activity (Tasaki, 1957) We can consider 
this activity as if it were produced by a stimulus intensity /q Then, 
denoting the intensity of the extnnsic stimulus by /*, we obtain 

€ - ai(/o + A) + agC/o + 4)2 -f (15S) 

This senes can be rewntten m the form 

.= a4+i)4-..c{l+;f+ (159) 


On the basis of masking expenments, which are discussed m Sec 5, we 
can assume that 4 « 4 ^he immediate vicinity of the threshold of 
audibility so that 



€= + "t) + + ~) + 

(160) 

or 

\ w \ 4 / 

c s <7i4 •+ <^*4* "b “b (<^i + 2ng4 + )4 

(161) 


Thus, independent of the exact relationship between the excitation and the 
stimulus intensity, the excitation produced by the extnnsic stimulus can 
be written m the form 

= (162) 

where £ ^ + 2a»h + = constant The direct proportionality 

between the excitation and stimulus intensity holds only near the threshold , 
at higher intensities more complex phenomena seem to take place They 
are discussed in Sec. 6 
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Now, it IS necessary to demonstrate that the requirement — €2 is 
consistent with direct observations McGill and Rosenblith (1951,1952) 
have investigated neural responses to pairs of short pulses They found 
that, m general, a preceding pulse decreases the response to the following 
one This does not happen at very small intensities, however The phe 
nomenon can be explained m terms of neural response probability 
Although at high intensities a considerable percentage of available neural 
units respond to the first pulse, only a few units respond at intensities 
near the threshold Denoting the probabihty of response to the first 
pulse by p, we obtain for the number of available units at the time of 
occurrence of the second pulse 

Ni = N[\-p-\- K(r)p], (163) 

where W is the total number of available units and K(r) is the fraction of 
units that have had time to recover before the second stimulation If p is 
very small, This demonstration is consistent with the assumption 

that /, 7 q, since under such conditions /©, not 7^, controls the number of 

available units 

We now investigate whether the theory developed for a pulse pair holds 
for any number of pulses and also for sinusoidal oscillations (Zwislocki, 
1960) For n pulses with a repetition rate v — 1/A/, we obtain 

4- + + 1). (164) 

which, for n CO, IS a series expansion of 


= 


1 — c”* 


(165) 


The ratio between the excitation at the end ofn pulses and the asymptotic 
excitation follows to 


(1 - 2 (166) 

It docs not seem unreasonable to assume that ?;„ = t; = const at the 
threshold of audib.hty Consequently, with e = Cl, we obtain 

■f . ( 16 - 7 ) 

" (I - e^') 2 e-''-''*' 
which can be simplified to 

1 
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or in decibel notation 


TS = 10]og/„ - 101og/«, * -101og(3 - O. 069) 

where TS means threshold shift and t — n At is the duration of the pulse 
sequence 

Equations 168 and 169 are analogous to Eq 152, so that the theory 
descnbes accurately the threshold of audibility of a 1000 cps tone as a 
function of duration Expenments have shown that the shape of the 
threshold curve does not change with frequency as long as the main portion 
of the frequency spectrum is confined to one cntical band (see Secs 3 and 
5). As a consequence, the theory holds independent of sound frequency. 

In the two-pulse experiment the duration of the sequence is varied with- 
out varying the number of elemental stimuli When the procedure is 
reversed and the number of pulses is varied in a constant time interval, 
then, by letting n in Eq 168 vary from I to some higher number, we can 
wnte 


I„ _ 1 - 
U 1 - c-' 


(170) 


The closed arcles in Fig 32 show expenmental data for a sequence 
duration t s= 80 msec The crosses in the same figure indicate the theo- 
retical predictions The largest ddference is 1 5 db (at n *= 9), and that is 
within the range of expenmental error. 



Number of pulses 

Fig 32 Threshold of audibility for short pubes equally spaced within a lime tntcn-a! 
of 80 msec, plotted as a futicUon of the number of pubes The circles are capenmental 
and the crosses theoretical Adapted with permission from Zwulocki (1960, p 1052) 
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Fig 33 Thtcshold of audibility for a piactically indnite senes of short pulses as a function 
of pulse repetition rate The circles are experimental, and the curve is theoretical 
Adapted Vinth permission from Z.v/islocki {I960, p 1052) 

When the duration of the sequence is made very long I/a), the 
expression approaches zero, and Eq 170 can be written in the form 

Is = 1 _ e-/', (171) 

where v *= nft is the repetition rate. We then obtain a prediction of the 
threshold as a function of the pulse repetition rate The theoretical and 
experimental data are compared m Fig 33, in which the points indicate 
the experimental results The pulses were 0 5 msec wide For shorter 
pulses, the agreement becomes less good because of spectral interference 
A still better agreement can be obtained when pulses are replaced by 
short tone bursts 

In order to find a mathematical expression for sequences of tone bursts 
we go back to Eq 164 and rewrite it m the form 

e 

_j. g— ^ ^ g— «(a»t+^)A< 

+ ^ ^ ^ ^ (J72) 

where k and a are arbitrary numbers and Ai = 1// 

By suppressing all pulses between the numbers k and k + u, between 
2k + ff and 2k 4- 2(t, and so forth, an intermittent stimulus is generated 
It IS on during the time Tjj = (»c + 1) At and off during the time Tf = 
(u “ 1) Ar Under these conditions, Eq 172 becomes 

- = (1 + + -t- e"*"*') 

(1 + q. ^ ^ g-flfn-l)(»+<r)A») (173) 



THRESHOLD OR AUDIBILITY 


63 

For steady state, that is, for « -► 00 , Eq 173 may be simplified according 
to Eq 165 

n, = €,(1 - 0-"+'“')-' i ■“ (174) 

1-0 

Since /= I/A/ is the earner frequency of the intermittent tone, and 
V e= l/(ff + a) A/ IS the burst repetition rate, we can also wnte 


(175) 

i-O 

Using Eqs 165 and 175, it is possible to compare the excitations produced 
by the intermittent and continuous tones 


^ (1 - £-*")(! - e-'T’ i e"*‘" (176) 

Voo “ 

This can be simplified to 


-2! = it (1 - (.-I'MV')(1 _ e-'T' (177) 

^00 


Assuming again that, at the threshold, % =* and introducing 
(k + I) A/, we obtain 


(178) 


or 


TS = 10 log (I - e-/’) - 10 Jog (1 - e-*^^) (179) 


Garner (1947a) measured the threshold of audibihty of an intermittent 
1000 cps tone for various burst repetition rates and burst durations His 
data are compared to the theoretical predictions in Fig 34 The agreement 



Repetition rate m bursts per second 


Fig 34 Threshold of audibility for an jntenmttcnt 1000 cps tone as a function of 
rcpetiuon rate and burst duration The points are expenmcntal the curves are calcu 
lated Adapted vath permission from Zvnsloclu {i960 p 1054) 
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Fig. 35. Threshold of audibiUty for 5 msec bursts of a 2000 cps tone as a function of 
burst repetition rate The circles indicate experimental values; the curve is calculated- 
Adapted with permission from Zwtslocki et al (1962, p. 1651). 


is reasonably good with respect to the slope of the curves as well as with 
respect to their position. In Eq. 175, the sum 

(180) 


determines the excitation caused by one burst, so that the ratio between 
the excitations by an infinite number of bursts and by one burst amounts 
to 

2i = (1 _ e-'’). (181) 

The corresponding threshold shift is 

TS *= 10 log 0 - e— '0. 

in agreement with the threshold shift for pulses (Eq. 171). Figure 35 
compares theoretical data to those obtained by Scholl (1961) with 5 msec 
bursts of a 2000 cps tone. 

Many more experiments on the threshold of audibility as a function of 
stimulus duration were performed. 'No experiment has ycl hecn reported 
that contradicts the theory of temporal summation previously discussed. 
On the contrary, its validity was extended to sounds heard in presence 
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of mterfenng noise (Zwicker & Wnght, 1963) and to suprathreshold 
stimuli. In the application of the theory, it is necessary to pay attention 
to the frequency spectrum, however The threshold of a broad-band 
random noise, for instance, vanes less with duration than does the thresh- 
old of a narrow-band stimulus The phenomenon was explained by 
Scholl (1961) in terms of cntical band formation Here, we may mention 
that the theory of temporal summation also apphes to other sense modali- 
ties and motor activity (Zwislocki, I960) It is possible that we are dealing 
with a general psychophysical law that, for the threshold of detectability, 
may be expressed by the equation 


— = , (183) 


where means the threshold intensity of a stimulus sequence, 7^^ the 
threshold intensity of another stimulus sequence, 7, and 7^ the intensities 
of elemental stimuli in each sequence, r, and the time intervals between 
elemental stimuli and the effective end of the sequence, and 1/ct a time 
constant It should be emphasized that a is a system constant which does 
not depend m any way on the stimulus Neither the intensities of the 
elemental stimuli nor the time intervals between them need be constant. 
The term “effective end of the sequence” means the time at which the 
excitation rj reaches its maximum For certain functions 7(0* this may 
happen before the stimulation ends Thereby, the assumption is made that 
the maximum excitation determines the threshold 
Equation 183 may also be written in the form 


am 

M~I 

When the intervals r, — and are very small, the sums of 

Eq 184 may be replaced by convolution integrals, and we obtain 

dr dr, (185) 

With Tx and T, denoting the effective sequence durations It should be 
noted that, for the purpose of determining these durations, the maximum 
excitation is assumed to occur at the time for which 


dt 


= 0 


and ^ < 0 
dr 
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4 2 Threshold of Audibihty as a Function of Sound 
Frequency 

The threshold of audibility for pure tones strongly depends on sound 
frequency An indication that temporal summation contributes to this 
dependence has already been given m the first part of this section The 
transmission characteristic of the car has an even stronger effect A 
threshold curve measured m a free sound field is plotted by means of the 
solid line in Fig 36 It is an average of three independent studies in which 
the listeners faced the sound source (Swian &. White, 1933, Fletcher, 
1953, Zwislocki, 1957) In all three studies, the sound pressure was 
measured at the center of the listener’s head The intermittent hne shows 
a partly theoretical and partly experimental transmission characteristic 
of the ear, which was determined m the following manner 
In Sec 2, we determined the transmission characteristic as the ratio 
between the sound pressure in a free field and the maximum volume 
displacement of the cochlear partiuon It is unlikely that the volume 
displacement itself controls the stimulation of the sensory cells A com- 
parison of electrophysiological measurements of Tasaki, Davis and 
Legouix (1952) with the mechanical measurements of v Bekesy (1943) leads 
to the conclusion that cochlear microphonics are approximately proper 
tional to the amplitude of displacement Since cochlear microphonics 
are regarded as the trigger for neural discharges, it is probably more 
correct to deal with the displacement amplitude of the cochlear partition 
than with its volume displacement The width of the basilar membrane, 



36 An average threshold of audibility measured under freefield conditions and the 
transmission characteristic of the ear as a function orfrequcnc> 
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v/hich supports the sensory cells, increases from the stapes to the hehco- 
trema by a factor of five (Wever, 1938) As a consequence, the ratio 
between the relevant displacement amplitude and the volume displace- 
ment of the partition changes from one end of the cochlea to the other by 
approximately the same amount Because of the correlation between 
sound frequency and location along the cochlear canal, this results in a 
slight improvement of sound transmission for high frequencies relative 
to the low ones A ratio of five for the amplitude corresponds to an 
increment of 13 db We shall assume that the ratio between displacement 
amplitude and volume displacement increases in direct proportion to 
distance from the helicotrema and therefore in nearly direct proportion to 
the logarithm of frequency Since the effect is small, the knowledge of 
the exact function is not necessaiy 

The transmission characteristic of Fig 36 is that derived in Sec 2 
corrected by the ratio of the displacement amplitude of the basilar 
membrane to its volume displacement Theoretically, it determines the 
amplitude of cochlear microphonics for a given free-field sound pressure 
measured at the location of the center of the listener's head There is an 
obvious difference between the transmission characteristic and the 
threshold curve, which means that the transmission properties of the ear 
are not the only controlling factor The decibel difference between the 
two curves is shown in Fig 37 by closed circles Over the frequency 
range of 200 and 8000 cps, the points cluster around a regression line with 
a slope of 3 db per octave This line agrees almost exactly with the 
threshold change predicted from the theory of temporal summation 
(Eqs 171 and 182, and Figs 33 and 35) Only at the lowest frequencies is 
there a significant departure of the points from the regression line It 



Fig 37 The decibel dilTcrence between the threshold of audibility and the trznsmasion 
characteristic of F>g 36 (cjrclra), and a theoretical curve resulting from temporal 
summation 
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coincides with the region in the cochlea where the density of innervation 
begins to decrease A correlation between the density of innervation and 
the sensitivity is well known for both the eye and the skin We make the 
simple assumption, therefore, that the threshold intensity is proportional 
to the density of innervation According to Wever (1949), there are about 
half as many ganglion cells per millimeter of length of the basilar mem- 
brane at the place of vibration maximum for 100 cps as there are at 
locations corresponding to frequencies above 500 cps This corresponds 
to a threshold increment of 3 db, in approximate agreement with the 
position of the 100 cps point relative to the regression line 
It is possible to conclude on the basis of Figs 36 and 37 that the 
threshold of audibility as a function of sound frequency is controlled in 
the mam by the transmission characteristic of the ear, the density of 
ganglion cells as projected on the basilar membrane, and the temporal 
summation in the nervous system This appears to be true for pure tones, 
wide band noise introduces an additional fector 

Fletcher (1940) assumed that spectral energy of noise was integrated 
over the bandwidth of a critical ratio According to Feldtkeller and 
Zwicker (1956) the summation occurs over the entire critical band 
(Secs 3 and 5) If we assume a random noise with a uniform spectral 
energy density <r, then the energy in a critical band is o k U was shown 
in Sec 3 that the critical bandwidth is proportional to the frequency jnd 
and, therefore, depends on frequency according to the equation 

(186) 

which IS the same as Eq 141 except for the multiplicative constant a 
approximately equal to 25 The equation can also be written in the form 

K=Kf+K^, (187) 

where AT = and kq = 63 cps 

Because of energy summation, the threshold of audibility for wide band 
noise decreases with frequency faster than does the threshold for pure 
tones The ratio between the two is expressed by k 


5 MASKING, CRITICAL BAND, AND CONTRAST 
PHENOMENA 

A complex sound containing two or more sinusoidal components may 
be heard as one event or, withm certain limits, each component may be 
listened to separately This analytical property of the auditory system is 
ascribed iti part to the wave patlcm in the cochlea In Sec 2, it was 



MASKING, CRITICAL BAND, AND COUTRAST PHENOMENA 


^9 

shown that for each audible frequent^ there is a maximum of vibration 
in a different portion of the cochlea Physiological evidence indicates that 
the location of the vibration maximum determines the group of maximally 
excited nerve endings which, as we assume, leads to pitch perception We 
saw that the vibration maximum is broad and, in itself, would permit only 
a very coarse sound analysis The distribution of excitation in the 
peripheral nervous system is considerably sharper, however, so that the 
amazing frequency sensitivity of hearing may be explained (Galambos & 
Davis, 1943, Tasaki, 1954, Katsuki, et al , 1958) The sharpening 
mechanism has been a subject of debate for a long time (v Bek6sy, 1929, 
1960a,b, Huggins & Licklider, 1951), and we shall discuss one possibility 
later in this section 

No analyzer has an infinite resolving power, and frequency components 
can interfere with each other to a greater or lesser extent, depending on 
their spacing along the frequency scale One aspect of such interference 
m hearing is called masking It means that a stronger component may 
prevent a weaker one from being heard The masking effect is measured 
most frequently as a threshold elevation of one or a group of frequency 
components produced by other stronger frequency components The 
masking sound is often called a “masker ” Masking effects produced by a 
pure tone of 400 cps and by a narrow band of noise of 90 cps width 
centered at 410 cps are shown in Fig 38 (Egan & Hake, 1950) The 
threshold elevation m decibels over the threshold obtained in the absence 
of the masker is plotted as a function of frequency of the exploring pure 
tone Although the sound-pressure level of both maskers was the same, 
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the narrow band noise produced a greater maximum effect The pure tone 
masking curve is flatter and spreads somewhat further toward high 
frequencies It also shows several relative maxima and minima For a 
long time, the complex shape of the pure tone masking curve was regarded 
as a result of nonhnearities m the mechanical system of the ear More 
recently, v Bek^sy (1957) suggested a different explanation We shall 
not discuss these secondary effects here, except to state that differences in 
relations and amplitude patterns are probably the mam reasons for 
the different masking effects of random noise and pure tones These 
conditions also depend on the noise bandwidth, so that the masking 
etlect ot a wide band noise cannot be completely predicted from the 
masking effects of narrow band components 

““SEKCts of ‘he masking phenomenon is that 
threshold elevation at a given frequency is proportional to the noise 
ener^ within the critical band centered on that frequency Fletcher (1940) 
introduced it first as a postulate and defined the critical band so that a 
Cd T masked when its power is equal to the power of the noise 
over a b I' (1954) demonstrated that the power was summed 

colouemfvTr.?b”'‘''^ ^ ^ 

masked tone ’ FI . h ' " k'" '' of the just 

“cr teal band’. “ ““"mal ratio,” and the name 

To be tme W 'P“‘”' discovered by Zwicker 

desertbed tnTen t f b® "■'.cal bandwidth that was 

m the thlsb Id f /b f' “"^''''hyond m Sec 4 for power summation 

fstect rflbe “^‘‘’■'"y I" 'h' section, we shall see still another 
aspect of the critical band concept 

wunn.fotmTr',‘’rf f'^qo^n^^y. a broad band no.se 

effect at hieh^ th * " t ^ sprctral power produces a greater masking 
Specflcallv den t f i°"b (Hawkins & Stevens, 1950) 

pmsS the '''' ""“^“y m quiet, by /„ that in 

presence of the masking no.se, by the spectral intensity of I.se, we 


^3=04.. Is, 
f'7’ It 


(188) 


Eri88''bKo"mes ™““‘ ‘’■“"''"dth In terms of threshold shift, 

-res /„ 

(189) 


TS s; 10 log i + 10 log ,r _ 4, 
r introducing for the critical bandits expression from Eq 187 
TSgil01ogi+l01og(jr/+ j _4 

i y 


( 190 ) 
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If we are interested in the masked threshold rather than m the threshold 
shift, we can write 

Tm = 10 log ~ + 10 log (K/ + K„) - 4, (191) 

■*0 

where Ig = J>c,ypc and is the effective reference sound pressure, usually 
taken as 0 0002 dyne/cm® Replacing intensities by sound pressures in 
Eq 191 we obtain 

Tjir ^ 20 log ^ + 10 log (Kf + Ko) - 4 (192) 

Po 

Equations 191 and 192 indicate that the threshold of a pure tone in 
presence of a uniform random noise is frequency independent for very 
low frequencies and increases in direct proportion to frequency in the high 
frequency region 

AH equations derived so far m this section show that the masked 
threshold increases in direct proportion to the spectral intensity This is 
not quite true when the noise intensity per critical band only slightly 
exceeds the threshold of audibility m quiet Figure 39 illustrates the 
situation when the noise intensity is plotted per cniicaj ratio, as defined 
m terms of TS by Eq 189 (Hawkins & Stevens, 1950) For critical ratio 
levels exceeding 10 db, the masked threshold is directly proportional to 



Fig 39 Masking of pure tones by random noise The threshold shift is plotted as a 
function of the e/Tective noise level Adapted wib permusion from Hawkins & Stevens 
(1950 p 11) 
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the noise level However, below 10 db the curve flattens and asymptotically 
reaches the threshold level m quiet This agrees with the assumption 
made in Sec 4 that there is a considerable amount of spontaneous 
activity in the auditory nervous system and that this activity may be 
introduced as an equivalent stimulus intensity Extending the assump- 
tion to mean that 7^^ produces a masking effect which, in fact, determines 
the threshold m absence of an extrinsic masker, we can correct Eqs 189 
and 190 by adding 7^^ = 7^//c to Ig and write 


TS ^ 10 logiS-ili™ + 10 log (Kf + <c„) - 4 ( 193 ) 

•T 

In absence of extrinsic maskers, Eq 193 becomes 

TSs 101og^+ 10Iog«--4 =0, (194) 

Ijt 

indicating that 

10 log = 10 log /j, + 4, (195) 

that IS, the stimulus intensity per critical band that is equivalent to 
^ontaneous neural activity is 4 db above the threshold of audibility 
With the numerical value of so determined, Eq 193 correctly 
describes the curve of Fig 39 As a consequence, Eqs 191 and 192 have 
to be rewritten as follows 


Tar - 10 log 


fs + 7jv 


■+10log(E/+,r„)-4, 


( 196 ) 


Tjf- 201 og^ ^~*~ 1 ’"^^ + 101 og(K/+ »: ) - 4 ( 197 ) 

Po 

of frequency has often been 
Xm If fh" ' distribution of excitation in the nervous 

Fm 1 7 i a comparison of Fig 38 with 

fhfn the vT. *>a^ a much sharper maximum 

Phvsil " 1 ■" '^“Wea, in agreemem with neuro- 

da?rof nhet"®^ belong to the 

When fnr mctnn the well known contrast effects in vision 

Its edee annenrc l^^m^ a darker surround, 

darkest m the immerl^'t “”’eal portion and the surround appears 

be schematized as su ^ ''‘emity of the boundary The phenomenon may 
d's nbuTion '? T M T '^'■e apper trace indfcates an idealized 
agamst L darke® ' of a light field presented 

appareurbn^LmL:'’"™"''' ^"Ls the ’Resulting 
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Fjg 40 A schematic representaiion of sensory contrast phenomena 

A considerable amount of experimental and theoretical work has 
been done on contrast phenomena These effects are generally ascribed 
to lateral neural inhibition, which was clearly demonstrated by Hartline 
and his co-workers in experiments on the hmulus eye (Hartlmc, 1949, 
Hartline & Ratliff, 1957, Ratliff, Hartline, Sc Miller, 1963) Since such 
an inhibition may occur at several neural levels and its mechanism is not 
known m all necessary detail, v Bekesy suggested a fictitious neural unit 
as a sensation element The unit could also be called a response unit since 
It directly relates the sensation to the stimulus It consists of an area of 
sensation surrounded by an area of inhibition as shown m Fig 41 (v 
Bekesy, 1960b) The upper drawing (a) schematizes an introspective 
impression of sensation distnbution in a onc-point experiment on the skin, 
the lower drawing shows a simplification of(o) for computational purposes 
Because of the smallness of the unit, the simplification does not materially 
affect the computational results The height of the sensation area of the 
unit in Fig 416 is made equal to the stimulus intensity at a given point of 
the receptor area Otherwise, the unit is defined by three parameters 
the extent of each of the inhibition areas r, the extent of the sensation area 
j, and the ratio between the sensation area and the inhibition areas SJR 
The extent of the sensation area is not critical v Bekesy determined 
the parameters for (he eye and the skin (touch) and construcled sensation 
distributions for various distributions of stimulus intensity Figure 42 
(v Bekesy, 1960b) shows as an example a stimulus increasing according 
to a constant gradient bounded at both ends by areas of constant stimulus 
intensity. It can be seen that a depression of sensation is associated with a 
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positive second derivative of stimulus intensity and an enhancement 
with a negative second derivative The extent of the areas of suppression 
and enhancement correlates with the extent of the elementary area of 
inhibition, which is much greater for the 
eye than for the skin More exactly, the 
extent of these areas is equal to 2r 

The numerical values of the par- 
ameters of the response unit can be 
determined from psychophysical experi- 
ments For the determination of the 
elementary inhibition area, v Bekesy 
used a trapezoidal distribution of stim- 
ulus intensity, as shown in Fig 43 
When the length of the top of the tra- 
pezoid IS shorter than or equal to r, the 
sensation distribution is flat topped 
When the length r is exceeded, however, 
the sensation magnitude shows a maxi- 
mum at each of the two upper edges of 
the stimulus distnbution 

The extent of the sensation area s may 
be determined, according to v Bekesy, 
by means of a two-pomt stimulation 
When the points are close to each other, 
a unimodal distribution of sensation 
magnitude is obtained When the sep- 
aration exceeds the width s, two sen 
sation maxima become apparent 

The ratio SjR was determined by i i t i i ^ 

V Bekesy with the help of a trial and- o lO 20 30 40 mm 

error method He first found a distri- d«mhuuon on 

bution of stimulus intensity producing a, a function of the width of 

a uniform distnbution of sensation a trapezoid stimulus Adapted with 
magnitude, as shown in Fig 44 In the pcrnuasion from v B«Jkrfsy (tssob p 
next step, he matched this distnbution 
by varying the SJR ratio 

V Bekesy’s response unit should be applicable to heanng, although the 
corresponding computations have not bWn made as yet The distribution 
of the stimulus magnitude over the sensoiy cells m the cochlea is controlled 
by the frequency spectrum at the oval window and by the pattern of 
vibration of the basilar membrane A pure tone produces an amplitude 
pattern of volume displacement as shown in Fig 12 It can be transformed 
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^Sensation magnitude 


(a> 




Fig 44 Sensation distribution for vanouscurvatumofa wide stimulus The curvature 
under (t) produces an even distribution Adapted with permission from v Bik^sy 
(1960b p 1067) 


into a pattern of displacement amplitude of the basilar membrane by 
taking into account the width of the membrane, according to the procedure 
outlined in Sec 4 When two or more frequency components are intro- 
duced, the resulting effective amplitude amounts to 

Y = W +Y^+ + (198) 

The distribution of the effective amplitude as a function of the distance 
from the stapes IS shown m Fig 45fortwo tone complexes m the frequency 
range between 500 and 1000 cps The solid line corresponds to a frequency 
separation of one critical band between the two components, and the 
intermittent line to a separation of two critical bands It can be seen that, 
in the first instance, the amplitude distribution has a plateau whose 
width IS one-half of a critical band, m the second instance, the plateau 
IS widened to a whole cntical band 
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According to v. Bekesy’s procedure, a flat-topped, approximately trape- 
zoidal distribution of stimulus magnitudes makes it possible to determine 
the extent of the inhibition area When the flat top is less than half the 
extent of the total inhibition area of a response unit, the distribution of the 
sensation magnitude is unimodal, when it exceeds one-half of the inhibi- 
tion area, the sensation magnitude shows two maxima coinciding with 
the edges of the flat top For a fiat top equal to one-half the inhibition 
area, the sensation distribution has a flat top also 

Greenwood (1961) performed masking experiments with two-tone 
complexes and found that the distribution of the masking effect with 
respect to sound frequency is unimodal as long as the spacing between the 
components is less than one critical band For greater frequency differ- 
ences, the distribution becomes bimodal CharactensUcally, the appearing 
maxima are closer together than the frequency components of the masking 
complex 

By comparing Greenwood’s results (Fig 46) to the amplitude distribu- 
tion of Fig 45, we can see that the transition from unimodal to bimodal 
masking distribution occurs when the flat top of the amplitude distribution 
has a width of one-half a critical band If we agree to consider the masking 
effect as a measure of neural excitation, we can conclude using v Bekesy’s 
procedure that the cntical band is equal to the extent of the inhibition 
area of the response unit, ^ 2 - ( 199 ) 



Fig 45 Dwtribution of the offlrcfne vibration amplitude along the cochlear partition 
for two-tone complexes m the frequency range from 500 to 1000 cps The solid lines 
correspond to a frequency separauon between the component of one cntual band, and 
the intermittent lines correspond to a separalion of two cntical bands 
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f L '"‘'‘'"8= sound-pressure level, the 

sJ«eJr , . A ■"•’■bifon area. It appears to be a 

7.TZZ T *=y ‘he neural network This constancy does 

hilt th ^ ^ nf ‘be sensation and inhibition areas, 

1 itiT ■? ™y ‘o investigate this at present. 

Dossihlen^ u P‘°‘^edure is essentially graphical, he indicates a 

possible analytical method It is based on the superposition theorem. In a 
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first step, the stimulus function is approximated by unit step functions, as 
shown in Fig. 47 Next, the sensation area of the response unit is divided 
into two components, such that SofR = I, and the remainder of the 
area, S-Sq With the modified response unit S^-R, the sensation distribu- 
tion is constructed for the step function. This distribution has values 
different from zero only in the neighborhood of the step It is used as a 
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Fjg 47 Analytical method for calculation of sensation distribution by means of super- 
position of V Bikisy's response uniU (a) approximation of stimulus distribution by 
means of step functions, (6) response unit, (c) step function, (if) response distribuuon 
for the step function under (e) obtained by means of superposition of response units 
Adapted with fiermission from V B^fcisy (1960b, p 1069} 
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new elementary function A{x) According to the superposition theorem, 
the sensation distribution for/^*), based on Sq-R units, is 


vM =j* - X)f'{X) dX, 


( 200 ) 


where x is the coordinate at which the sensation magnitude is being 
determined and X is the distance between x and the origin of each step 
unction Thus the sensation distribution based on the entire response 
unit S-R IS * 


!/(ir) = (S-S.)/(i) + y,(x) ( 201 ) 

I. 'V* ‘he usefulness of v Bekesy’s response unit 

Its disadvantage is that, in general, the ratio SfR changes with stimulus 
in ensi y evertheless, it has already fulfilled the purpose of indicating 
he extent of unit inhibition areas m the eye, in the skin for touch, and in 
the ear 


6 THE LOUDNESS FUNCTION 

loifdn«=“®'' “s => qualitative attribute of sound, 
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identity Such an axiom has never been seriously questioned, and loudness 
matches are accepted as standard experimental and clinical procedures 
They are even used for earphone calibration 
Perfect stimulus identity is practically impossible to achieve, but it can 
be closely approximated when a sound with a certain frequency and phase 
spectrum is compared in loudness to another sound with the same spec- 
trum Such a comparison is tnvial, of course unless the relation between 
the sound pressure and loudness of one sound is different from that of the 
other The necessary variation may be produced by means of a masking 
noise or as a result of certain pathological changes m the auditory system 
A considerable number of experiments were performed in which the 
loudness of a masked pure tone was compared to that of the same tone in 
the absence of masking, unfortunately, only a few of these experiments 
were carefully balanced A monaural experiment designed especially to 
validate the loudness scaling procedures was performed in the following 
general way (Heilman & Zwislocki, 1964) A 1000 cps tone was presented 
to one ear together with a random noise whose sound-pressure level 
was varied parametrically The observers estimated the loudness of the 
tone by assigning numbers according to the procedure of magnitude 
estimation without a designated standard and, later, by adjusting the 
loudness to match given numbers according to the method of magnitude 
production, also without a designated standard The combined method is 
called “numerical magnitude balance” By presenting the lone inter- 
mittently with the noise in an auxiliary experimental senes, small biases 
produced by listening to one signal m the presence of another were 
eliminated Next, the loudness of the masked tone was compared by a 
method of adjustment to the loudness of the same tone in the absence of 
masking In order to minimize biases, the masked tone was adjusted m 
one senes and the unmasked m the other The final results of direct 
loudness matches are compared to those denved from numerical magnitude 
balanceinFjg 48 Theclosedcirclcsindicatelhcdirectrcsults, thecurves, 
the denved ones The evident agreement between them could not have 
been achieved unless the observersjudged correctly the loudness ratios in 
the procedure of numcncal magnitude balance The conclusion that 
observers can express loudness magnitudes m numbers appears inescapable 
It IS reinforced by the agreement of the direct loudness matches with 
results of similar experiments performed by other insestigators who used 
balanced procedures Loudness matches between one normal and one 
pathological car obtained on a large number of patients hase led to 
simihrdata (Miskolczy-Fodor, 1960) 

The direct results obtained b> means of numerical magnitude balance 
are plotted m Fig 49 as a function of sound pressure Ic\el The highest 
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Fig 48 Scmauon level of an unmasked 1000 cps tone scnui the senjation leiel (m 
quiet) of the tame tone in pretence of masking noise when bolhproduce the sameloudness 
The circles indicate the results of direct loudness matches, the curves were derived from 
numencal magnitude balancing Adapted with permission from Heilman it Zwislocki 


loudness values were obtained in the absence of the masking noise The 
remaining two curves show the course of the loudness function when the 
masking effect amounts to 40 db and 60 db respectively Open circles 
Le f 'he corresponding threshold levels At high 

power functl™"'' hne obeying the 

L = KP^ (202) 

to haveTo™ T “session of Steven's (1957) power law, which appears 
depends of th h The consent K 

St muTus "a ^ --“dal.ty and certain 

nTrodm tv h T. T ® <=haracterlst.c of each 
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Sne s cu™ A bmaural 

.96o.^MirnrzZocr 'm3rT\rT ^ 

the curve nasses thrnTic»h *u Usually, the units are so chosen that 

which IS the unit nf 1 ^ e point 40 db sensation level and one sone, 
wn^n IS the unit of loudness (Stevens, 1936) 

the loudness^^nctiM^L*” derive a general mathematical expression for 
(Heilman & ^“^1 .9^“* ” P'S 

power law and an anv’i ^ ""dh the help of the 

well established ' psychoph^ical phenomenon which appears 
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The loudness of a sound signal consisting of more than one frequency 
component depends not only on the sound pressure of the components 
but also on the frequency spacing between them (Zwicker & Feldtkeller, 
1955; Bauch, 1956; Zwicker, Flottorp, & Stevens, 1957). When the 
frequency intervals are so large that there is little interference among the 
components, the total loudness is equal to the sum of component loud- 
nesses (Fletcher, 1940). When the frequency spectrum is limited to one 
critical band, the loudness depends on the total power, irrespective of its 



Fig. 49. Loudness of a monaurally presented 1000 cps tone m quiet and >n presence of 
s random masking notsc. The curves indicate averages of eapenmenul data, and the 
open arelcs the corresponding threshold levels. The closed arclcs and crosses indicate 
the theoretical loudness values. 
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frequency distribution Thus we can write 


( f+Cll \9 

1 P^j (203) 

for the loudness function at sufficiently high sound-pressure levels The 

complex sound may consist, for instance, of a band of random noise and 

a sinusoid in its center. For such conditions, Eq 203 may be rewritten 
in the form 

L = (204) 


where Ps means the effective pressure of the sinusoid and that of the 
random noise 

Because of the analytic properties of our auditory system which go 
beyond the phenomena described by the place theory (Licklider, 1959), 
we are able to listen either to the total acoustic event, to the random 
noise alone, or to the sinusoid alone. Figure 49 indicates what happens 
to loudness in the latter event It is quite clear that the loudness of the 
random noise is disregarded since the perceived loudness approaches zero 
at 47 or even 67 db, where the noise is well above its threshold of audibility. 

Probably the simplest way of representing mathematically the phenom- 
enon of selective listening is to subtract the loudness of noise from the 
total loudness When the spectrum is limited to one critical band, this 
leads to the equation 

Ls = KiPs^ + Pi,^y - Lv (205) 

for the loudness of the sinusoid From Eqs 202 and 203, 


, ^ = KPi,^\ (206) 

hence we have 

Ls= K [(Ps^ + (207) 

In Secs 4 and 5, the existence of an intrinsic physiological noise was 
'* “sumed that the threshold of audibility m 

?oull T'r ^ "’“'‘’"8 This assumphon, 

“ calculation of the 

equivalent stimulus power of the intrinsic noise, with the result that 


= 2 SP^>, (208) 

equivalent sound pressure of the intrinsic noise and Pt 

be reVarHeS u '“f "> Eq 207 may 

comnnnpnt P ** consisted of two components an extrinsic 

omponent P^^ and an intrinsic component P„ Consequently, 


( 209 ) 
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and Eq 207 can be written in the form 

Ls--K [{Ps^ + W + Pmr - (W + Py/y] (210) 

When no extrinsic noise is introduced, Eq 210 becomes 

- p^ (21 1) 

or introducing the value o^P^y from Eq 208 

l(Ps^ + 2 5/>rT - (2 SP^^f] (212) 


This result is similar to Zwicker’s (1958), which was obtained in a different 
way. The closed circles in Fig 49 show that Eq 212 accurately describes 
the monaural loudness function for a 1000 cps tone presented m quiet 
A certain point m the derivation remains to be clarified, however It was 
assumed at the outset that the spectral power is limited to one critical 
band This is not true for the intrinsic noise, and the contributions of 
bands outside the critical band centered at 1000 cps have to be considered 
According to Feldtkeller and Zwicker (1956) and othermvestigators, the 
total loudness of an acoustic event of low power whose spectrum exceeds 
one critical band is simply an anthmetic sum of loudness contributions of 
all critical bands included in the spectrum Consequently, Eq 211 for 
the loudness of a sinusoid in the presence of intrinsic noise, or any other 
weak noise, has to be modified as follows 

Ls = Ki(Ps^+ - Pj.n + I - z (213) 

P 

The sums S exclude the band containing the sinusoid Since they are 
identical but of opposite ^ign, they cancel each other out, and we see that 
Eq 213 IS identical to £q/21I 

At higher sound pressure levels, the situation is more complex Even 

bawl, a.wi thA e-LCitatioas. 

different bands partially cancel each other Under such conditions, it is 
still possible to write for the loudness contribution of a critical band to the 
loudness of a sinusoid heard in presence of random noise 

Ls. = KKPsy + p^yy - p^y^l (214) 

where is the effective equivalent sound pressure produced by the 
sinusoid m the critical band k and Pnk IS the effective sound pressure 
produced by the masking noise in the same band The total loudness of 
the sinusoid amounts to 

x-s = s ^.[(psy + p-^yy - p^y% (215) 

m which the sum can be limited to those bands for which Ps^ >0 As in 
Eq 213, terms contnbuted by other bands cancel each other out Except 
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in the immediate vicinity of the miskcd threshold of audibility, 

so that for high and moderate sensation levels, Eq 215 may be approxi- 
mated by 

= (216) 


or separating the sum terms 


= (217) 

In this equation, the first sum represents the total loudness of the sinusoid 
in absolute quiet and the second sum represents the partial loudness of 
noise that is contributed by the critical bands excited by the sinusoid 
The number of these bands is limited to one at the threshold of audibility 
and increases with the sound pressure level of the sinusoid For noise 
bands of moderate width and for sufficiently high sensation levels of the 
tone, all bands that are excited by the noise are also excited by the tone, 
and so the second sum of Eq 217 is equal to the total loudness of the noise 
when presented alone Under such conditions, it is possible to write 


Ls — Lsa — (218) 

where the subscripts 0 indicate the loudness of each sound presented 
separately More accurately, according to Eq 207, we have 

i-s *= a: (219) 

The masked curves of Fig 49 were obtained with a band of noise of 
approximately one octave Such a noise centered at 1000 cps should 
produce a signal to noise ratio of approximately -10 db at the threshold 
au 1 1 1 y For these conditions Eq 219 may be rewritten m the form 

Ls = K [(P^a + - 1 9/»y2e] (220) 

Loudness values calculated by means of Eq 220 are plotted in Fig 49 

aL^tclv hJ expentnamal curves quite 

0^,; ^ sound pressures 

It ruav s elu r insignificant 

an e^errJlTv,'' ’"aerate sound presLe measurement, 

in the mathem '^t’ a* interaction not included 

LoreticaranT between the 

h o^ aUeast “> ^''date the 

tticoty, at least as a first order approximation 

mtoestaA"TT‘‘'“’"^ Eq 2i2, lead to an 

teresting conclusion concerning the threshold of audibility as defined 
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for the purposes of this chapter At the threshold, = P^, and Eq 212 
= X [(3 SP^r - (2 5P^n (221) 


which means that > 0 This is as it should be By definition, the 
signal can be perceived 50% of the lime, and since there is no negative 
loudness, the average loudness must be greater than zero 
The denvations of this section are based on the tacit assumption that 
the primary loudness function. 


L =: 


is determined by a stimulus transformation in the end organ, and that the 
operations of higher centers are limited to algebraic summation This 
postulate has been validated indirectly by the expenmenlal verification of 
the obtained equations If it is not disproved by further research, it may 
become of fundamental importance in understanding the functioning of 
the brain 

As a corollary to the conclusion that higher centers are limited to 
algebraic summation, we should expect loudness to be proportional to 
the peripheral neural activity Physiological expenments of Katsuki and 
his CO workers (1962) provide an opportunity for a preliminary check in 
this respect Katsuki investigated the neural activity in the peripheral and 
central auditory system of monkeys He made a statistical evaluation of 
thresholds in neurons of the first order and correlated these thresholds 
with the dynamic response charactcnstics Of particular interest is his 
investigation of responses to preferred frequencies that is, to frequencies 
that are associated with the lowest threshold of a given neuron Katsuki 
found that these minimum thresholds are normally distributed within two 
populations, a large population with low and medium thresholds and a 
small population with high thresholds In neurons with low thresholds, 
the finng rate increases with the sound*prcssure level slowly, and in those 
with higher thresholds, more rapidly On the basis of Kalsuki’s data, a 
smooth curve can be drawn indicating the rate at v. hich the finng increases 
with sound pressure level as a function of the threshold level Using this 
curve, a firing charactenstic can be drawn for each class of neurons A 
somewhat idealized family of such charactcnstics is plotted in Fig 50 
It shows a striking regutanty The responses grow approximately accord- 
ini’ to a loganthmic function of sound pressure, and the slope increases 
with the sound pressure lc\cl at the threshold This sound pressure lc\cl 
IS indicated by the points where the cur\-cs cross the absassa axis The 
preferred frequencies of the neurons, whose charactcnstics arc approxi- 
mated in fig 50, were included within the range of 600 and JOOOcps 
The numbers on the curxes indicate the percentage of neurons in each 



ANALYSIS OF SOME AUDITORY CHARACTERISTICS 




THE LOUDNESS FUNCTION 


^9 

function was produced by human observers In order to show that the 
coincidence is unlikely to be fortuitous, we can introduce tuo additional 
points of information First, the sound-pressure level identified with the 
threshold of audibility agrees with the level for which the response of 
the most sensitive neurons in Kalsuki’s experiments begins to rise above the 
spontaneous activity of these neurons. Second, the numerical constants 
of the loudness function (Eq 212) appear reasonable when applied to 
physiological measurements. Assuming that the discrepancy between the 
calculated neural response and the loudness curve at higher sound- 
pressure levels stems from the omission of a portion of active fibers, we 
can conclude that the total rate of neuml firing obeys the same function 
as loudness Consequently, Eq 212 should hold with the exception of the 
numerical value of which is about 10* times larger than for loudness. 
We obtain for the total rate of neural firing 

+ 2 5Pr*)’ - (2 SPj.')'). (222) 



Scv^ I*v*f re 

rfOCCCJeip-^^ere* 
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where A^*(2 5P should be equal to the total spontaneous activity N^i 
of the considered group of fibers If the threshold is assumed to be the 
same as for human observers, the threshold sound pressure is 7 db above 
0 0002 dyne/cm=, that is, at 4 5 10 dyne/cm^ With this value, K* = 
3 1 10^ and 0 = 0 27, the spontaneous activity in a group of 100 fibers 

JVjv, = 530sec-' 


This result appears to be of the same order of magnitude as the spontane- 
ous activity observed by Katsuki Some of the most sensitive fibers 
investigated by him fired spontaneonsly at a rate of 70 per second, and it 
IS possible to estimate from his diagrams that 10 to 20% of the fibers 
exhibited spontaneous activity Thus Eqs 212 and 222 describe accurately 
the perceived loudness as well as the peripheral neural activity, and the 
prior conclusion that the auditory nervous system above the level of the 
end organ behaves like a linear system is confirmed A further verification 
can be found in an analysis of loudness as a function of stimulus duration 
or repetition rate 


It IS known that the loudness of short noise or tone bursts increases 
with burst duration (v Bekesy, 1929, Munson, 1947, Miller, 1948, 
earner, 1948) The auditoiy system behaves as if it integrated acoustic 
power m a similar way to that found at the threshold of audibility The 
difference can be found in the time constant of integration 
Which appears to decrease with sensation level (Miller, 1948) Thus, 
oudness as a function of stimulus duration could be described by a set of 
equations similar to that derived for the threshold A more careful 
cornnv" ” however, that the situation is more 

ofTwe’r'’"! proportional to the time integral 

DerfnrmeHf^iI!'°^if theory emerges from the analyses 

Md fcHo r of audibility as a function of stimulus duration 

tionarLte r"?' Provided that one addi- 

postulates accem H “dded At the same time, all 

mms marntarfh iT '’’apter and the resulting theo- 

thc loudness fii 'T *" the steady-state expression for 

remam IcTanru Tn i"" “f '™Potal summation 

below the levefnr 'i" ” 'tamentary neural excitation 

oelow the level of temporal integration is given by 


e = A' + 2 5Pr»)» _ (2 57>r»)®], (223) 

hort stimulus produces at the level of integration an excitation 
I? = Cl-”' = A' [(/»^« + 2 5/>j.<)» _ (2 (224) 

hich decijs exponentially with a time constant 1/a = 200 msec In 
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Sec. 4, we derived an equation for intermittent stimuli and expressed it as 
a ratio of excitations produced by an infinite series of bursts and one 
burst This equation has the form 


’I. 

i)a 

or m terms of sound pressure. 


(225) 


(Ps," + 2 5PrY - (2 if yY 
Vb [(Pub* + 2 5P/)» - (2 SP/)'](: - 
For sufficiently high sound-pressure levels, it can be simplified to 


’Id Psn“(l-e-'*) 


(227) 


Equality of loudness is considered to be identical with the equality of 
cumulative excitation, so that r), = ijj, Under these conditions 


Puli’" _ 1 

Ps," 1 - e-"' ’ 

or in decibels, 

201og^ = i.0,og(^.) 

Since 0s= 0 27, 1/0 » 3 7, and the sound-pressure level changes 3 7 
times as rapidly with increasing repetition rate as at the threshold of 
audibility This is in contradiction to direct experiments that result in a 
considerably smaller slope The disagreement is due to an important 
omission in the derivation of Eq 225 This derivation was based on the 
assumption that, for equal elemental stimuli, the elemental excitations are 
equal Such an assumption is reasonable for near*threshold sound pres- 
sures, but not for suprathreshold stimulation McGill (1952) has 
demonstrated that a preceding click depresses the neural response to a 
following one Peake, Goldstein, and Kiang (1962) measured the 
magnitude of peripheral response as a function of repetition rate of short 
noise bursts and found that it decreased rapidly with repetition rate as well 
as with burst duration and sound pressure. Their results can be described 
quite well by the empirical formula 

e, = - e-*"). (230} 

where the constant (f> depends on sound pressure and burst duration. 
Peake’s results for one set of parameters arc compared to values obtained 
by means of Eq 230 in Fig 52. Agreement with other of his results is 


(228) 

(229) 
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Frequency in cycles per second 

Fig 52 Amplitudes of averaged A^, responses m the auditory nerve as a function of the 
repetition rate of short noise bursts (Peake et al 1962) The crosses and the curve show 
a mathematical approximation of the data 


equally good or better Under such conditions, Eq 228 must be corrected 
to ’ 

1 _ e-«. 

and we obmn in decibels 


20 log ^ 

Pc, 




Provided that the parameter ^ is known, Eq 232 determines values of 
mav bfr v"/™ ! The results of the calculation 

the loudtess ? TT' •’y (1957), who investigated 

duration , '"'h.tc no.se as a function of burst 

ftc mteLmem f the loudness of 

sound-pressure level'of w'^db P oh"' k”””" “ constant 

terms of HinVrpm,- j Pollack s results are shown in Fig 53 m 

the continuous n ' level between the intermittent and 

mL for three h„rs.'!f’ P'”""' “ “ 

S I pm^e on, ?o “’«.cate tLoreti- 

colw'^aTkts u L 1“'““ * l^"f°«unately, Peake and his 

Ie\cls than Pollack durations and lower sound pressure 

.5 not possible for P U l determination of the parameter <f> 

by ex tCS 0^ Th r ' Nevertheless, it can be estimated 

a srunTnts urn llv r°'' ^ '>«"«" “O “"d 

.he v,e,„,^y":^rio'dh; ° 
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Fig 53 Loudness level of an intermittent white noise as a function of burst repetition 
rate and burst duration The points have been obtained experimentally by Pollack 
(1958), the curves arc calculated Adapted with permission from Pollack (1958 p 182) 

that these values produce differences in sound-pressure level that are m 
reasonable agreement with those obtained by Pollack Furthermore, it 
should be noted that the position and the slope of the theoretical curves 
are related to each other in approximately the same way as the corre- 
sponding parameters of the experimental results For low repetition rates, 
the theoretical sound-pressure level decreases at a rate of approximately 
3 db for each doubling of burst duration This also is m agreement with 
the experimental findings The only efiect that is not accounted for 
theoretically is the slightly negative level difference at moderately high 
repetition, rates and for burst durations equal to or greater than 1 msec 
The effect may be due to a slow neural adaptation whose existence 
was ascertained by several investigators and which was omitted from the 
theory For the purposes of this chapter, the effect is of httle interest 
We can conclude, therefore, that the theoretical and the experimental data 
of Fig 53 are mutually consistent When more relevant empirical results 
are accumulated, the theory discussed in this section should make it 
possible to calculate loudness as a function of spectral sound pressure 
and temporal pattern, that is, for any acoustic stimulus 
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Theoretical Treatments of 
Selected Visual Problems 


The reader familiar with the visual literature knows that this is an area of 
many laws and little order Quantitative data on a variety of psycho- 
physical relations are plentiful, and the field is consequently an inviting 
one for the systematist or model builder Jn this chapter we shall review 
a sample of the problems and the approaches that seem to us, for one 
reason or another, to be of potential interest to the mathematical psychol- 
ogist The review is by no means comprehensive nor is it intended as a 
substitute for an introductory treatment for readers totally unfamiliar 
with the area The selection and presentation of material in this chapter 
are intended rather to illustrate, for the reader who is not a specialist in 
vision, issues for which, at best, only partial solutions have been developed 

For the first section on spatial variations of stimulation, we have 
selected treatments of the Mach band effect, which is a speaal case of 
spatial contrast, because it represents an instance in which the mathe- 
matical treatment of a particular effect ought to have, but typically has not 
had, a general influence on analyses of the whole class of bnghtness 
contrast phenomena to which il belongs It is, moreover, an example of 
an old phenomenon that is currently of much interest, and one for which 
the mathematical description of a hundred years ago (Mach) can be 
compared and contrasted with some contemporary approaches and 
formulations 

The second section of the chapter focuses on the problem of visual 
flicker perception This problem was selected to provide a good context 
ID which to compare points of view that concentrate, respectively, on 
photochemical events at the peripheral receptor level (Hecht), on the 
lawful biological characteristics of the total organism (Crozier), on specific 
properties of neural behavior (Bartley and Pieron), and, finally, on the 
translation of sensory problems to the language and analytical techniques 
of systems engineering (Kelly) 

The third section deals with different approaches toward a general 
transformation that will describe the basic relations between light stimula- 
tion that vanes along the two physical contmua of energy and wavelength 
or that vanes in spectral composition and the three-dimensional psycho- 
logical domain of perceived color space This topic provides a broad 
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framework in which to examine a vanety of matters of fundamental 
importance to the model builder m the area of color vision and to look 
somewhat more critically at the kinds of assumptions, interpretations, and 
experiments that are involved in the development of visual theories 


I SPATIAL VARIATIONS OF STIMULATION 


Mathematical analyses of visual problems may be categorized m any 
of a number of ways, for example, m terms of the kind of model employed, 
the class of phenomena considered, the kinds of data incorporated, and 
so on To take a specific example, “Mach bands” represent a striking 
instance of the class of visual brightness phenomena more generally 
described as contrast and edge effects that depend on certain character- 
istics of the spatial distribution of stimulus luminance The term “Mach 
bands” refers to the perception of sharp brightness reversals (contours) in 
spite of the fact that the associated stimulus variation is a monotonic 
spatial gradient of luminance Ernst Mach (1865, 1866a, 1866b, 1868a, 
1868b, 1906), who called attention to the phenomenon, analyzed the effect 
m terms of a visual brightness response that followed the second derivative 
of the stimulus distribution, which does indeed show sharply defined 
maxima and minima at the locations where bright or dark contours are 
perceived visually 

As an approximate descriptive relation, Mach first (1865) suggested 
the following formula for the perceived brightness e of a surface whose 
illumination i varies only in the x direction of the x. y retinal stimulus 
plane^ 


In the formula, the Fechnenan psychophysical law is assumed, and a and 
b are constants This description implies that the local brightness depends 
only on the curvature of the physical light distribution at that spatial 
,?r provision for an influence on the local brightness 

s„cl!Tn'‘ “ To provide for 

element '"f **’'v*' decreases with distance r from the local surface 

approLatir 


'I + fa/Vj) + 1 

' L (1/r,) + (1/r,) + (llr,) + J - J ,,/r, 

• In the following equations Maeh s onginnl * 
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In a fourth paper, (1868a) he staled the more general relation 

S Ac 

Here, e represents the excitation at the position of stimulus intensity i. 
Ay represents each element of the remaining surface, ^ is a function of the 
distance r of each element from the e position, and 1 ' is the stimulus 
intensity associated with each element Ay The distance function <f>{r) 
can be thought of as made up of two components, one that drops very 
rapidly with distance over short distances and the other that has a slower 
rate of change over larger separations We do not reproduce here Mach’s 
mathematical development to derive the specific form of this distance 
function, but it remains the most detailed analysis of the problem to date 

Mach saw clearly, moreover, the broader implications of the phenomena 
that he was attempting to analyze They implied to him, first, that spatial 
distributions of perceived brightness are not geometrically similar to 
spatial distributions of luminance, and second, that the observed psycho- 
physical transformations require mechanisms of physiological interaction 
among excited elements of the visual tissue This was almost a hundred 
years ago, but only recently has there been a renewed interest in the Mach 
bands and edge effects and along with this interest a new senes of attempts 
at their mathematical description and generalization Among the more 
recent suggestions, Ludvigh (1953) has proposed that, m addition to the 
dependence of the center of the perceived band or doublet on the second 
derivative of the stimulus distribution t = /(x, y) at the retina, there is a 
dependence of the width of the band on the fourth derivative of this 
function 

Lowry and dePalma (1961) have used the Mach band phenomenon to 
study the transfer fvnetjon, £>r stne yvave respctnse function, of the visua) 
system The concept of sine-wave response is used to evaluate the fidelity 
of optical systems and entails a convolution of the object luminance 
distribution function with the line spread functions of the image-forming 
elements It presumably represents a more useful measure than the older 
techniques that depend on determinations of the limits of resolution of the 
system The psychophysical correlate of the older technique would be 
the determination of threshold contrast sensitivity or visual acuity The 
Mach band phenomenon, however, is a suprathreshold effect that relates 
more directly to the characteristics of optical imagery that the newer 
transfer function approach is intended to define In essence, the procedure 
used by Lowry and dePalma tnvohcs the use of a specially constructed 
Msual photometer to match the vanations in apparent brightness at a 
senes of positions along a known spatial gradient of luminance that gnes 
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rise to the Mach effects. The matching, or apparent, luminance values 
define the image distribution function I(x') generated by the objective 
luminance distribution 0{x) through the transfer function of the visual 
system Mathematical analysis of the data is accomplished by taking the 
Fourier transforms, 0*(v), and /*(v), of the object and image distributions 
The sine-wave response, y4*(v), is derived by dividing /*(v) by The 

amplitude function is l>4*(v)| The function determined in this way reflects 
the characteristics of the whole visual system, that is, the optical charac- 
teristics of the eye and the properties of the physiological response 
mechanisms To what extent the function determined by this kind of 
analysis can be given more general application to describe the characteris- 
tics of visual contour perception remains to be determined 
Bliss and Macurdy (1961) compared two alternative models to account 
for the Mach band phenomenon, a “physiological” one based on Hartline 
and Ratliff s (1958) study of the laws of inhibitory interaction in the eye of 
the Limulus, and a “psychological” one based on v Bekesy’s (1960) 
concept of the "neural ’ unit and a “funnelmg” process in the skin, ear, 
and visual systems They apply the techniques of linear-system analysis 
t^o relate the two models as discrete systems, using two-sided z-transforms 
or the space functions of excitation, response, inhibition, and neural 
unit The development shows that the transform of the neural unit is the 
reciprocal of the transform of the inhibition function Related continuous 
systems are described by Fourier transforms, and Gaussian blurring 
tunctions are assumed to convert the objective light distribution to the 
stimulus distribution at the retina The spatial distribution of brightness 
response derived from this approach for a given object luminance distnbu- 
M " Bliss and Macurdy to exhibit the mam features of the 

f compared with the response distribution that is 

generated by the use of Mach’s early (1865) formula 

'inni C’^amples arc cited mainly as illustrations of a current trend to 
r’’ “ from other d.sc.pUnrs (phys.ology, 

oa hoihT '"S'^nng) to the analysts or mathemat.cal deLlpt.on of 
S could h T rL'’ '™ 'nstances. and m many others 

analvos t '’‘'•It' visual system are evaluated by 

S d set or"® P”'™"’'"™ of obtLed for an expressly 
restricted set of experimental conditions 

«eTs ofThe hlh??' ■nnu="ced by many param- 

J™nol d mh a"'''"'*'"® size, spatial d.strLtion, 

vanabtecar^L Z"’ composition'^ These stimulus 

vanab « Tuch^rTr »'•!' a variety of organismic 

and emn’incal nsv cab state ofvisual adaptation, 

empirical psychophysical relations that interrelate many of these 
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variables for a variety of criterion responses have been determined and 
redetermined to create a literature of enormous proportions 
Broadly comprehensive models are clearly needed to integrate this 
material, but such treatments are rare Comprehensive treatments stated 
with sufficient precision to handle the data quantitatively are still more 
rare 


2 TEMPORAL VARIATIONS OF STIMULATION 


The best known attempt to provide a precisely stated comprehensive 
theoretical model of vision is that of Selig Hecht Hecht’s theory, which 
was first presented in detail m the early 1930’s (Hecht, 1934), was based 
almost exclusively on the use of equations intended to describe the 
photoreceptor process Many specific details are now of historical 
interest only Nonetheless, Hecht’s work continues to deserve careful 
study, for it contains a summary of major psychophysical relations that 
still need to be accounted for and provides detailed examples of both the 
advantages and shortcomings of (his kind of analytical approach Other 
photochemical formulations have been presented, but Hecht’s extensive 
and systematic work has unquestionably been the most influential 
Hecht summarized the events underlying the photoreceptor process as 
composed of primary and secondary reactions in a coupled system that 
involve, essentially, a photochemical change with absorption ot light, the 
triggering of a nerve impulse by the energy release, and a replenishment of 
the photosensitive substance In simplest paradigm form, the first part 
of the receptor process can be represented as a completely reversible 
system 

dark 

where S is the primary photosensitive substance and P and A are photo- 
lytic products Since the system is, in pnnciple, a reversible one, contin- 
uous steady illumination establishes a stationary "equilibrium” state 
where the rate of photolysis of S is balanced by the regenerative, dark 
formation of 5 from P and A The equation of this steady slate is 


(fl - 


where / is light intensity, a is the initial concentration of the photo- 
sensitive substance S, x is the concentration of photoproducts or the 
fraction of S that is changed to P and A at equiltbnum, and K is the 
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equilibrium constant of the system and represents the ratio of ki, 
the photochemical velocity constant, to the dark-reaction velocity 
The exponents m and n relate, respectively, to the order (monomolecu- 
lar, bimolecular, etc ) of the primary photochemical process and of 
the dark reaction 

The essential hypothesis for Hecht’s whole development was that a 
particular criterion response is associated with a fixed change in photo- 
chemical concentration in the sensory receptors from one steady state to 
another Thus, for example, brightness discrimination between a given 
intensity I and a second 7 + A/ is determined by a constant difference 
^2 “ in the two steady-state equations 


(a - (a - 

On this kind of conceptual basis, Hecht was able to derive theoretical 
functions which gave, on the whole, rather good fits to a variety of 
psychophysical data relating stimulus intensity to brightness discrimina- 
tion, instantaneous thresholds, visual acuity, and critical flicker fusion, 
or t e time course of bright and dark adaptation for rod and cone vision, 
and also, with additional assumptions and the postulation of a specific set 
of spectral absorptions for three different cone substances, for data of 
color mixture and color discrimination 
This integrative account of many of the data of vision and Hecht’s 
oeriyation of their quantitative properties from an analysis of the initial 
pnotoreceptor processes was an impressive achievement It gave impetus 
lo/r development and refinements of photochemical models (Jahn, 
946, Moon & Spencer, 1945) and the impact of this kind of approach 
Still continues (Cornsweet, 1962) 

H^echt's conceptual approach has, however, been subjected to consid- 
nhoiL" r PredoTiinant role assigned to 

o? been questioned, and neural aspects 

l>“ve received increasing emphasis 

Prob’ablv^hf & Ornstcin, 1939, Thomson, 1950) 

critics wL W T r formidable of Hecht’s 

He ern ipTr His objections were many 

nauon Toiv’e discrimi- 

of visual nerfnr ^ ^ Organism, the quantitative properties 

ftc pronerUef„rT“ determined exclusively by 

(Crazier, Wolf. & imSwolf 79^0)“ pVrJh 
..VC properties of intensity discV.minaL"7rr~urt7e%\r‘t 
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different inodes of sensory excitation in vision, audition, kinesthesis, 
somesthesis, etc (Crozier, 1936, Crozier & Holway, 1937, 1938, 1939; 
Holway Crozier, 1937), and since this is true for quite different 
organisms (Crozier, Wolf, & Zerrahn-Wolf, 1937a), Crozier was most 
dubious that the specific physiochemjcal mechanism of excitation is 
derivable from the discrimination functions (Crozier, Wolf, & Zerrahn- 
WoJf, 1937b, 1937c) Hecht’s success at curve fitting for visual data was 
attributed by Crozier to his ad hoc manipulation of constants If these 
constants were allowed to vary only in ways consistent with their presumed 
photochemical significance, then, Crozier pointed out, they failed to 
predict correctly the outcomes of specific experimental manipulations 
(Crozier, Wolf, & Zerrahn-Wolf, 1937d) 

To apply his steady-state equation to flicker data, for example, Hecht 
assumed that the stationary state is reached at critical fusion frequency 
Since the luminance of an intermittently flashing stimulus that appears 
continuous is decreased by the fraction of dark-phase time in a single cycle 
and since by simply adjusting the luminance of the flash proportionately, 
we can make a brightness match to a continuous stimulus of the original 
luminance, time and intensity are clearly reciprocal and interchangeable 
In the flicker experiments, when critical fusion frequency is reached at 
a given luminance level and flicker has disappeared, we have a steady state 
At this equihbnum point the rapidly alternating light and dark reactions 
of equal duration are m balance, what has been decomposed during the 
light period is regenerated dunng the dark period The steady-state 
equation 


KI^ 


x” 

(a - x)- 


applies here With the additional assumption that flicker frequency for 
fusion depends directly on the concentration x of the photoproducts during 
a light flash, the flicker data can be described by the relation 



(Fcx - F)' 


Here, the critical flicker rate Ffor fusion at any luminance is substituted 
for the photochemical concentration x at that luminance, and the maximal 
cntical flicker frequency for the given parameters is substituted for 
the maximal amount of photosensitive material, a In this relation, we 
remember that K does have biological significance since it represents the 
ratio Ai/Aj of the photochemical and dark-reaction velocity constants 
The constant ki for the photochemical change must have a low tempera- 
ture coefficient, whereas A, for the dark reaction must have a much larger 
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temperature coefficient Crozier sought to evaluate Hecht’s system experi- 
mentally by studying the critical flicker frequency dependence on stimulus 
intensity when the temperature of the tested organism is varied (Crozier, 
Wolf, & Zerrahn-Wolf, 1937b, 1937c) With a decrease in temperature, 
ki decreases and the ratio kxlk^ must increase If is independent 
of temperature (an experimentally established fact), then at any fixed 
frequency F the critical illumination I must vary in the same sense as the 
temperature variation Instead, the data show that the relation is an 
inverse one Flicker measures on the insect Anax, a vertebrate sunfish 
Enneacanthus, and the turtle Pseudemys all show constancy of the F^^j^ 
value, an invariant form of the CFF dependence on log /, but a displace- 
ment of the functions toward lower rather than higher intensities with an 
increase m temperature (Crozier, Wolf, & Zerrahn-Wolf, 1937b, 1937c, 
1938b) Other discrepancies occur m evaluating Hecht’s model with 
respect to variations m the light-to-dark time ratios of the flickering 
stimulus (Crozier, Wolf, & Zerrahn-Wolf, 1938c), and all m all Crozier 
was able to amass a rather impressive body of data to justify his assertions 
that Hecht s use of quantitative formulas to describe flicker functions fell 
far short of demonstrating that the functions reflected in any simple way 
the quantitative properties of retinal photochemistry 

In the last decade or so considerable additional evidence has cast further 
doubt on some of the basic premises of photochemical theories of vision 
There have been large strides m photopigment analysis (Dartnall, 1957), 
advances m electrophysiological techniques (Granit, 1962), new develop- 
ments m reflection densitometry (Rushton et al., 1955, Weale, 1953), and 
increased concern with the quantum aspects of the visual stimulus (Baum- 
gardt, 1949, 1959, Pirenne, 1962) 

In essence, no simple relation has been demonstrated between pigment 
concentration and measured visual sensitivities as was originally assumed 
between pholopigmcnt concentrations and discrimina- 
been foun^h T b=l’=«v.orally or electrophysiologieally) have 

a«^L?rrnnT M T'' the d,screpane,es 

' '^“"“"'’jelin, & Zcw. 1939, Rushton & Cohen; 1954) 
contrast ,n 't«‘™="t of visual data is in interesting 

with visual nr H ’ P'’°t°t^h™ical model analysis Crozier’s concern 
Telw eontrof r"” -ntorest in the 

tnvamn, fca lf ^peciHcat.on of those 

of the orinnistn (c' °'V' '’'■'‘toblc biological characteristics 

?uncho”;f;7”„ f 1934) He sought to develop a 

described bv ration 1 ^ determining relations that could be 

rnrconL^y Z ' r"' r ^gniftcant vanables 

constams Thus, for Crozter. the functional relations might, in some 
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Stances, reveal fundamental differences that had their basis in different 
roperties of the peripheral sensory receptors, as in the “rod” and “cone” 
ranches of many visual functions, but other features of these same 
lations might have nothing to do with the specific properties of the 
*nsory end organs (Crozier & Wolf, 1938, 1944) 

The concept crucial to Crozier’s analytical approach is that of biological 
anability, thus any given stimulus value is associated with a population 
f excitatory effects m the neural centers, and any measure of sensory 
iscnmination between two stimulus values is determined by the character- 
>tics of the two associated population distributions The biological 
esponse variability is, of course, measured by the dispersion of stimulus 
alues for a constant criterion response, and hence Crozicr’s analyses of 
jsychophysical discrimination functions always involved measures of 
lispersion as well as of mean stimulus values (Crozier, 1935, 1936) 

When the analysis is applied, say, to the problem of critical flicker 
requency, the relevant data are not simply the mean flicker values 
elated to mean stimulus light intensities for marginal flicker recognition 
ar fusion They are, rather, the measures ± Op or /„ ± Cj that 
determine the band margins of the flicker contour Response to flicker is 
regarded as a limiting case of intensity discrimination involving the 
statistical comparison of neural effects of excitation during the light phase 
With their aftereffects during the dark phase, and the function relating 
critical flicker frequency to light intensity has the characteristics of a log 
Gaussian probability integral 

F = ^ ^ (log / - log j dHogl) 

(Crozier & Wolf, 1940b) F is the critical flicker frequency, the 
maximal frequency at which the function becomes asymptotic, / the crit- 
ical illumination at frequency F, /<> is the cntical illumination at the 
inflection point, where T" has a value equal to , and a =* is the 
standard deviation for flicker recognition The dispersion of the critical 
illumination for flicker recognition is a vanabihty measure of special 
interest for, like other visual functions that depend on intensity discrimina- 
tion, this dispersion is directly proportional to the mean difference m 
illumination required for the criterion response In Crozier’s view, it 
reflects the spread of excitabilities of the central response system and 
defines a band of constant width ± log J„ in the functional relations 
between critical illumination /and flicker frequency The parameter 
IS taken to be related to the total population of elements of sensoiy effect 
that are capable of involvement in the discnmmation of flicker, and F is 
proportional to the additive effects of elemental excitations at a given level 
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of illumination 7 The Iq parameter (also represented by the symbol 
t' m many of Crozier’s publications) is interpreted as a measure of relative 
mean excitability 

Crozier’s flicker studies were designed to establish the characteristics 
of the flicker contour for various animal species and to analyze the effects 
of specific experimental manipulations on the various parameters of the 
flicker function All these parameters may differ from one animal species 
to the next, and m a given individual they differ for stimulation of separate, 
nonhomogeneous receptor populations, that is, the rods and cones of duplex 
retinas (Crozier, Wolf, & Zerrahn-Wolf, 1938c) For a given ratio of light- 
to-dark time m the flicker cycle, Crozier was able to demonstrate the 


“organic invariance” of the Cjog/ and the parameters as specifically 
heritable attributes subject to genetic variation as shown by cross-breeding 
experiments (Crozier, Wolf, & Zerrahn-Wolf, 1937a, Crozier & Wolf, 
1939) The (T|og j and the parameters are interpreted as stable, limiting 
characteristics of the organism and consequently not susceptible to manip- 
ulation by changes in the momentary state of a given organic system 
Thus, for example, temperature variations that alter the excitation levels 
of the response elements do not affect either the spread or the total popula- 
tion of excitable effects, and j and are invariant with temperature 
changes The /q parameter, however, which is interpreted as an inverse 
measure of relative excitability, js systematically dependent on temperature 
changes (Crozier, Wolf, & Zerrahn-Wolf, 1938b) 

At a given temperature, as the light-dark ratio is changed, remains 
unchanged, but both and vary, which is consistent with the inter- 
pretation of the F^^^ parameter as an index of the total population of 
involvement in” the discnmination (Crozier &- 
Wolf, 1940a, 1941) Here Crozier was making use of the concept of 
neural availability” proposed by Holway and Hurvich (1937, 1938) to 
account for their discrimination data in both kmesthesis and vision. 

^ '’“s, increases with a decrease in relative light time “because the 
succession of briefer flashes and longer dark time enables more elements 
(Crr!: excitable state, just as with larger visual areas” 

innit/.n ° ^ Zerrahn-Wolf, 1938d) Moreover, since temperature 

in th(* excitation essentially by mfluencmg the velocities of reactions 
rat n nr r M ‘>'Pe«dencc of the flicker contour on the 

"ih systemafcally 


'ngcnious empirical manipulations of Ihc 
Itic analvsis nr ? function and of the complexities that emerge in 

of siimtn cnpecics diircrcnccs and in the human data for a variety 

of stimulus variables „ beyond the scope and purpose of this chapter 
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Reports of most of this work can be found in the Journal of General 
Physiology in a series of papers published between 1940 and 1945 Our 
intent here is simply to indicate the nature of his approach and the potential 
of this kind of probability model as a powerful and broadly applicable 
analytical tool in the study of visual data (Crozier & Wolf, 1942) 

Much more typical in the visual literature is the kind of analysis that 
attempts to “explain” the psychophysical data m terras of specific charac- 
teristics of neural response Bartley (1959) probably best exemplifies this 
approach with respect to visual flicker data He has been particularly 
concerned with the so called brightness enhancement eifect that occurs at 
relatively low stimulus frequencies, and he has attempted to integrate the 
neurological findings as related to the various flicker data in terms of his 
major conceptual formulation, “the alternation of response theory ” 
Bartley’s experimental work on the relation of the light-dark ratio to 
critical flicker frequency has led him to question the usual assumption 
that flicker is eliminated at only one cntical light-dark ratio for a given 
stimulus intensity and flicker rate He accounts for his findings with the 
assumption that “off responses” may trigger flicker perceptions under 
some conditions (Bartley, 1961) 

Pieron (1961) has also recently given an interpretative theoretical 
account of the complex visual flicker situation in relation to neurophysio- 
logical evidence at vanous levels m the optic pathway The perceptual 
uniformity or nonuniformity of an intermittent stimulus is, in his view, 
determined by the behavior of the slow excitatory retinal potential and 
the activity of what Pieron calls the “on-with” discharge units whose 
frequency is proportional to this potential Whether these potentials are 
umf^orm, oscillatory, or discrete depends on whether the photons or light 
quanta (regardless of their time distnbution) fall withm the critical duration 
T (the duration within which time and intensity are reciprocally related) 

If they do, the slow potential is uniform and there is no perceptible 
flicker, since this cntical duration decreases with a loganthmic increase in 
the stimulus luminance, cntical flicker frequency increases directly with 
log luminance m accordance with the well known Ferry-Porter law 
F= klog I + a If r IS exceeded and there is only partial time intensity 
integration, the slow potential oscillates and the neural impulse frequency 
IS such that flicker is perceived If the pulsing stimuli are still longer and 
exceed the “utilization time” for even partial time intensity integration, 
then the potentials are independent and discrete light pulses are perceived 
Pieron has been particularly concerned wth resolving apparent discrep- 
ancies in the literature concerned with the dependence of flicker on the 
light-to-dark ratio His theoretical account seeks to demonstrate that 
the “contradictory” empirical data are necessary consequences of the 
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way the stimulus luminance is controlled in the difTerent experiments, 
that is, whether the luminance of each pulse is the same for different 
light-to-dark time ratios or whether compensatory adjustments are made 
to maintain a constant average luminance for the difTerent pulse durations 
Pieron is sharply critical of the increasing recent trend to emphasize the 
wave form of the stimulus variations in analysis of visual flicker responses 
Since the process of physiological excitation does not measure or mirror 
the stimulus wave pattern, Pteron regards the application of Fourier 
analysis to the complex stimulus wave forms and the generated harmonics 
as irrelevant to an understanding of flicker phenomena 
There is, nevertheless, much of interest in the use of harmonically pure 
stimulus wave forms to investigate the temporal characteristics of the visual 
process (deLange Dzn. 1954, Levinson, 1959) The background of this 
kind of experimental and analytical study lies in engineering Within this 
conceptual framework, the problem is translated into one of information 
processing, with stimuli as “signals,” the visual mechanism as a light 
transducer, and threshold determinations as an application of the “dnving- 
point null” method A good example of systems analysis applied to the 
contained in Kelly’s recent work (1961a, 1961b, 


In his experiments with time-dependent stimub, Kelly employed pn- 
edgeless” Ganzfeld (to minimize spatial interaction 
enects) and harmonically pure, smusoidaUy flickering lights instead of the 
MnTf . conventionally used in flicker experiments 

Modulation amphtude thresholds are measured at a senes of fixed 
constant time-average radiances Functions relating 
“““"S' modulahon frequency for white light at a senes of 
adaptation levels show that amphtude sensitivity lends to rise to a maxi- 

ftequency is further in- 

fre™™ ^ g^^dual shift m sensitivity peak toward higher 

frequencies as adaptation level ,s increased ^ 

as Daran^\er^^fh^^H curves with modulation frequency 

cSteTd’s let HI '=”Po™l sensitivity 

luminance curves obtimyiy’othem ^“sitivity versus 

form of lummanr-« f ^ othcrs The data can also be shown in the 

tbs form modulation ratios, m 

versus lurama^e ^Tnangly. the more conventional CFF 

pamcular”ute=sS ^=«angular wave forms Of 

three parameters only 

high frequencv threshlilH single valued, both low and 

range for the three ™alle« moduto™ mtmS' 
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Individual differences, color, and color adaptation dependencies have 
also been explored by Kelly (1962b, 1962c) in his analysis of the time- 
dependent responses Of major interest, however, is his basic single- 
channel model proposed to account for the average white light data of his 
group of eight observers 

Kelly (1961b) seeks to account for his experimental data by postulating 
signal transformations at two levels in the visual receptor system The 
model, which takes account of the neural refractory periods and the 
“all-or-none” law, is more physiological than the usual electrical network 
analogs The first stage is a photoreactive linear one whose fillenng action 
controls the shape of the amplitude sensitivity curve at low frequencies, 
the second stage, a pulse modulator or encoding stage that converts the 
first stage output into trains of impulses of identical heights and durations 
but spaced at varying time intervals, controls the high frequency cut-off 
The amplitude response function of the first stage (a transform function of 
a linear differential response) is combined with the threshold amplitude 
response function of the pulse-encoding stage and with proper choice of 
parameters is shown to fit the white light amplitude sensitivity data The 
model IS also shown to predict reasonably well the transient responses 
obtained in other types of psychophysical and electrophysiological 
experiments The details of the model are given in Kelly (1961b) 

3 QUALITATIVE VARIATIONS OF STIMULATION 
3 1 Stimulus and Perceptual Domains 

Some of the most intriguing and refractory problems of visual psycho- 
physics arise in the analysis of color perception where the relations are 
three-dimensional 

The psychological dimensions of perceived color may be conceived as 
hue, saturation, and brightness, the physical dimensions as radiant 
energy or total light intensity, wavelength, and wavelength distribution 
The wavelength distnbution specifies for a given light stimulus the relative 
amounts of ener^ throughout the visible spectrum In the event that the 
stimulus is a so-called monochromatic spectral light, the wavelength 
distnbution is specified as the spectral bandwidth, and the stimuli are 
often treated as if they were limited to only the single wavelength on which 
the actual band is centered 

It IS, of course, known that there is not a unique, one-to-one corre- 
spondence between each uniquely specifiable physical light stimulus and the 
color perception that It chats, m general, a vanety of physical stimuli can 
be used to elicit identical color sensations 
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The stimulus domain undergoes a first-order transformation to a 
psychophysical domain by the determination of three-dimensional 
color-equivalence relations in color-mixture experiments. Three spectral 
stimuli are selected, each of which is fixed in bandwidth and wavelength, 
but variable in energy or radiant flux The amounts and proportions of 
these stimuli are then varied in order to produce a color equation for each 
of a finite series of wavelengths taken to represent the visible spectrum. In 
principle, these equations may be written as* 

Tj = r*! -f- Tj + Fg, 

where F^ is the arbitrary unit radiant flux of the given spectral wavelength 
to be matched and Fj, Fj, and F^ are the radiant flux proportions of the 
three mixture primaries required for the color equation In practice, the 
spectral color-match situation is represented by 

r’i + Ti 2 a s Fj -f Fg, 

where F^g represents the proportion of one of the three mixture- 
primaries which IS combined with the test stimulus F^ in order to make the 
actual color match, represented by the symbol » ® The expression 
written m this form takes cognizance of the fact that narrow band spectral 
test stimuli appear more saturated than do mixtures of the three spectral 
primaries, and hence the spectral test stimulus must be desaturated by the 
addition of one of the primaries to establish a complete color match 
Under these circumstances, one of the terms in the first equation takes on 
a negative value A complete senes of matching relations constitutes the 
experimental data for the unit color equations by means of which any color 
stimulus of known wavelength distribution and energy content can be 
specified in terms of its three-variable mixture equivalent * 

If the mixture primaries are evaluated in terms of their relative lumi- 
nances rather than m percentage units of radiant flux, the color equation 

□ecompt ’ ^ 


— i-l + Lg + Lg 

,n '“"""^nces are implied to be strictly additive That is, 

evaluated the radiant energy of the spectral test stimulus ^ 

thf^ml J '“f '”o«ty function is stated to be identical to 

Ihertw three mixture primaries ;ii, and ^ 

The exmm ,o T luminosity function 
mental est siw * requirement is strictly met in experi- 

mental test situations is currently of considerable interest and in some 

* Color matchinif k!™ adopted to represent visual color matches 

(1887), and Judd and Wyszecki 0963) “ ™ctor notation See Hering 
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Fig 1 Color equivalence relations in unit coordinates WDW units Drawn with 
permission from Stiles & Burch (1955) 

doubt (Graham, 1959, Judd, 1955) For the present purpose, we shall 
accept the equation as an approximately valid expression of color 
equivalence in luminance terms 

Treatments of color-mixture data do, m fact, rely completely on the 
Validity of the additivity principle, and many experimenters have followed 
the practice of specifying the stimulus amounts m arbitrary scale units 
together with some measure of relative luminances but with no direct 
calibration of the physical energy units of the mixture primaries 

Figure 1 shows a set of unit coordinate color equivalence relations 
reported by Stiles and Burch (1955) Here the units are based on the 
amounts of the primaries required for two selected spectral matches, a 
Convention first proposed by W D Wright (1928-1929) and known as the 
WDW unit coordinate system. Figure 2 shows the same color-mixture 
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The stimulus domain undergoes a first-order transformation to a 
psychophysical domain by the determination of three-dimensional 
color-equivalence relations in color-mixture experiments Three spectral 
stimuli are selected, each of which is fixed in bandwidth and wavelength, 
but variable in energy or radiant llux. The amounts and proportions of 
these stimuli are then varied m order to produce a color equation for each 
of a finite series of wavelengths taken to represent the visible spectrum. In 
principle, these equations may be written as‘ 

r. = r, -f r, + r„ 

where is the arbitrary unit radiant flux of the given spectral wavelength 
to be matched and Tj, and are the radiant flux proportions of the 
three mixture primaries required for the color equation In practice, the 
spectral color-match situation is represented by 

+ r*! 2 3 — Ti + ^2 + Pg, 


where Fug represents the proportion of one of the three mixture- 
primaries which IS combined with the test stimulus F; m order to make the 
actual color match, represented by the symbol s ^ The expression 
written m this form takes cognizance of the fact that narrow band spectral 
test stimuli appear more saturated than do mixtures of the three spectral 
primaries, and hence the spectral test stimulus must be desalurated by the 
addition of one of the primaries to establish a complete color match, 
under these circumstances, one of the terms m the first equation takes on 
a negative value A complete series of matching relations constitutes the 
experimental data for the unit color equations by means of which any color 
s imu us o nown wavelength distribution and enerey content can be 
specified in terms of its three-vanable mixture equivalent * 

the mixture primaries are evaluated in terms of their relative lumi- 
become? percentage units of radiant flux, the color equation 


to be strictly additive That is 
evaluated bv ^ radiant energy of the spectral test stimulus / 

tte ramleH ' ‘“"’■"“"y f-nction is sfated to be identical t< 

Pnntaries ;i., and A 
The extent to ^ the same luminosity function 

mental test sitii« ^ requirement is strictly met m experi- 

mental test situations is currently of considerable interest and in some 

* Color niati;hmg**relatfms°raS''ll^ hT” 'epresem visual color malchcs 

(1887), and Judd and WyL?Mi 963 ) expressed vector notation See Hering 



Q.UAL1TATIVE VARIATIONS OF STIMULATION 





Fig 1 Color equivalence relations in unit coordinates WDW units Drawn with 
permission from Stiles & Burch (1955) 

doubt (Graham, 1959, Judd, J955) For the present purpose, we shall 
accept the equation as an approximately valid expression of color 
equivalence in luminance terms 

Treatments of color-mixture data do, in fact, rely completely on the 
validity of the additivity principle, and many experimenters have followed 
the practice of specifying the stimulus amounts in arbitrary scale units 
together with some measure of relative luminances but with no direct 
calibration of the physical energy units of the mixture-primaries 

Figure 1 shows a set of unit coordinate color equivalence relations 
reported by Stiles and Burch (1955) Here the units are based on the 
amounts of the primaries required for two selected spectral matches, a 
convention first proposed by W. D Wright (1928-1929) and known as the 
WDW unit coordinate system. Figure 2 shows the same color-mixture 





fl® ^ spectral mixture functions for spectral primaries 645 jnfi, 526 m/i, and 444 raft 
Units of relative luminance for an equal energy spectrum (Function for 444 m/t primary 
IS plotted as ten times the actual values ) The three ordinate values at each wavelength 
spectral luminosity function Drawn with permission from Stiles & Burch 


data evaluated in terms of relative luminosities for a spectrum of unit 
energy ^ 

Figure 3 shows a plane in chromaticity space plotted from the unit 
coordinate data of Fig 1 If we assume that the viewing conditions (for 
example, stimulus size, surround stimulation, state of adaptation, and 
retinal locus) are held constant, this space has the following properties 
ny stimulus of known physical characteristics can be represented within 
t .^all shmuh within the triangle formed by the loci of the three primaries 
„r’ ,u ® '’“*’8 = 0) nan be matched by positive 

smcp rnn n ^ t Stimuli located along any straight line in the 

whose mixtures of two stimuli between 

rr^rnrn^.n encompassed All physical stimuli represented at the 
irtlms ir space have the same mixture equivalent 

expressed m amounts of the specified spectral primaries 

equivalents m Since the absolute amounts of the mixture 

spaa does not be identical for all such stimuli, such a chromaticity 
pecify that all stimuli having identical loci are identical in 
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perceived color, but it does say that they can be made to appear identical 
by a simple adjustment in amount, with no change in their spectral distri- 
butions If, furthermore, a luminance specification is added to the infor- 
mation contained in the chromaticity space, then we can imagine a senes 
of spaces identical to that shown m Fig 3, each one for a specified, constant 
level of luminance Under these circumstances all stimuli that have the 
same locus in the chromaticity space of constant luminance must be 
identical m color appearance 

A chromaticity chart for a plane of constant luminance is, then, a form 
of three-dimensional psychophysical relation for perceived color, but only 
in the sense of specifying perceptual identities The locus of a stimulus in 
this space provides no information whatsoever about the hue of the color or 
how saturated or bright it appears to be We know only that the hue, 
saturatibn, and brightness are the same, respectively, for all those stimuli 
that have the same chromaticity and luminance specifications, nor does 
the distance separating two distinct loci in this chromaticity space provide 
any information concerning the magnitude of the perceived color difference 
between them Pairs of stimuli whose chromaticity loci are equally distant 
need not look equally different in color Had the basic color mixture 
experiments been carried out with spectral mixture primaries different 
from those used by Stiles and Burch (1955), the appearance and the spacing 



Fir 3 Chromaliaiy chart with spectnim locut in WTIW unit* for tpcctral mixture 
primaries G45 m#i, 526 m/*, »nd 444 mft Drawn with pcrmunon from Suits & Burch 
(1955) 
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Wavelength, m>t 


g. 4. Stilci and Burch color-mixture data tramrormcd to CIE trijtimulm values 
II values are positive, the 5 function is the spectral luminosity function, and the £. i, 
h ('l9M)" S'ttlizable stimuli Drassm svith permission from Stiles S. 


r the chromalicity space plotted in Fig. 3 would also have been different. 
nl7ntI„°K“.n has been established expefi- 

mvK/.* ^ three arbitrarily selected primaries, the data 

t terms of any other arbitranly selected set by means of 

simple linear transformation of the form 

Tj' = o.,r, -1- a„r, -F OuFj, 
r,’ = a„r, -F o„r, -f ojaPj, 
r, = n„r, + D„r , -f our,. 

F Z of the Stiles and Burch data to the 

P es that have been agreed upon by the International 


1 
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’omtnission of lUumination (known by the initials CIE for Commission 
nternationale de I’fidairage) as a conventional standard for chromaticity 
pecification, and in Fig 5 the same data are represented in CIE chroma- 
icity space All stimuli that have a common locus in Fig 3 transform to a 

lewcommonloeusinFig 5 ,butst.mnlnspairsseparatedbyequald.dan«s 

n Fig 3 ate separated by unequal distances in Fig 5 The CIE chrotna 
ticity%ace has certain advantages of convenience (all values are po^dive) 
andUvention to recommend it. but it has neither more nor less percep- 
tual significance than does the original on which it is based 

The importance of a chromaticity space that permits the «n que spec. 

identical color perceptions under stand p pnmvalence identity 
Further steps are needed to ^“PP'^'”^" 

relations with a system of stimulus differences 

perceived color of approaches may be followed 

For this purpose, any one of a variety ff ntii the nature of the 
Some involve specific theoretical j minimally on a pnon 

color vision mechanism, whereas oth ^ant assumptions about 

notions about visual 

formatmns and color difference equations see Judd & WyszecK , 



Fig 5 Spectrum locus 
ucity chart Draw.-T» " 
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In the following sections, we shall discuss some of these different 
approaches and in so doing try to present a reasonably representative 
picture of the logical and mathematical handling of some of the basic 
issues m color vision We shall neither attempt to provide a comprehensive 
historical survey nor limit the discussion exclusively to current work We 
shall, however, consider the early approaches where they seem essential 
as background for the later developments 


3 2 Helmholtz’s Line-Element 


An early attempt to handle perceived color space in quantitative fashion 
was made by Helmholtz and it is reported in the second edition of his 
treatise on physiological optics (1896) The data on which his treatment 
depended include color-mixture equivalence relations, measures of mini- 
mally discnminable differences among adjacent wavelengths of constant 
luminance, and measures of minimally discnminable intensity differences 
for white light and for a series of spectral wavelengths 
Helmholtz assumed in accordance with Young’s theory of color vision 
that the total gamut of color sensations is determined by three fundamental 
hue sensations, Si, S^, and in varying amounts and proportions The 
magnitudes of the three separate sensation differences are written as 
dSi, dSi, dSs, and dS represents the net difference in perceived color 
Since each of these terms represents a difference in sensation, dS ^ 0 
only when dSi = d5j = dS^ = 0 The separate sensation differences 
were assumed by Helmholtz to combine in the following manner 


The threshold value for the perception of a difference is set equal to one, 
u va ues less than one for each of the three separate sensations are 
assume to contribute to a net sensation difference that has a minimum 
value of one 

formula for sensation differences to stimulus measures, 
Helmholtz stated the following set of relations, which he described as a 
more precise expression of Fcchner’s law 


X — 

1 

X 

^ + lx + mj/ + nz 


1 

y 

1 -h fx my -h ns 

dS, = Z — 

I 


1 + lx+ mtj + nz 
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In this expression, x, y, z are amounts of three fundamental excitations, 
JT, Y, Z are functions of x, y, z such that X = Y = Z ^ k{a. constant) at 
high excitation levels, and at moderate light intensities 

^ kx y _ ky ^ _ kz 

a+ar’ b + y* c -{-z' 

where a, b, and c are constants, assumed to represent the separate com- 
ponents of what Helmholtz called the intrinsic light of the retina The last 
term 1/(1 + lx my + nz), where /, m, and n are constants, is a glare 
factor introduced to account for the “damping” of the measured intensity 
Thus the sensation difference relations are essentially empirical generaliza- 
tions based on the data of intensity discrimination but interpreted as 
relations between fundamental excitations and fundamental sensations 
(The symbols used here are not to be confused with the CIE notation ) 

In developing his logical argument, Helmholtz states that when a color 
IS changed only in intensity, all its components are changed by the same 
fraction rfe, and thus 

dx^x d€, dy^y dz^z di 

The sensation difference becomes 

iS= — (A-* +y‘ + 2=)’^, 

I + lx + my + m 

and for high intensities where X= Y ssZ ^ k 

1 + /x -h my -h nz 

Two slightly different colors may be said to have the excitation values 
X, y, 2 and x -b dx, y + dy, e + dz, respectively If the first color is 
increased in intensity by a factor 1 -b c, then the excitation component 
differences between the two colors become 

dx — <x, dy — ey, and dz ~ ez 
Their sensation difference becomes 
“ - 

where the glare factor is equal to unity 
To determine the value of e so that dS is a minimum, dS^ is differentiated 
wth respect to e and the result set equal to zero The value of t becomes 
^ ^ JVVdx/x) -b r^dy/v) + ZWz) 

^ A* + y* -b z* 



122 


THEORETICAL TREATMENTS OF SELECTED VISUAL PROBLEMS 


By substituting this value of e in the sensation difference equation 


dS 


x‘‘ y4- 

\ X 


y / V V g / 

JT" + y* + 2^ 



The units in which x, y, z and T, Z are expressed are arbitrary, but they 
must be the same for both sets of terms 
Since the terms x, y, and 2, however, presumably represent the funda- 
mental excitations of the three-variable color system they must bear a 
specific relation to the trivariate color-mixture data Helmholtz assumed 
that they could be represented by a simple linear transformation of the 
color-matching data obtained by Konig Two further restrictions were 
placed on the permissible transformation Helmholtz assumed that all 
values must be positive and that the transformed functions must also 
account for Konig’s empirical measures of minimally discriminable 
wavelength differences for colors of equal brightness 
The *, y, and z excitation values are solved for by 

X = fljiK -b OiiG + 

y — <321^ + -f a^^B, 
z = -b a^tG -b 

where x, y, a are positive values and R, G, B are measured amounts of 
three stimuli used for spectral color equations (or a linear transformation 
o such measures) The values x, y, z must simultaneously satisfy the 
following equation 


— = ^ 1 dy 

^ dA y 


'Y+ (1 M 

y \y dX z dxl 


(I ^ _ 1 
\2 dX X dxJ J 


? “'asur'd value for the least discriminable wave- 

‘'’= spectrum The level of 
equal to k '■'S'- ‘l-at X. Y. and Z are all 

°f ‘he three excitation functions resulting 
Imholtz s solution are shown m Fig 6, and Fie 7 shows the 

Sj^al sT'^ The chromaticities of 

To convT ,' 1 ,“'", spectrnm loons within this space 

smee^w “■‘■"Sfovmation of the color-mixture space to a 

P hat represents minimal sensation differences uniformly, Helmholtz 
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Fjg. 6. Hebnboltz’a fundamental cxatafjon curves. For Xg, yg, and Zg units, sec 
Helmholtz (1896, p 455). 

Stated the following relations between the excitation processes *, y, z 
and their sensory effects I, C* 

log (a + I) = f, 
log {b + y) = ij, 
fog (c + z) = C. 

Consequently, for 

(dS)^ = + (dS^^ + {dS,y 



Fjg. 7. Helmholtz’i chromaucity space based on data In Helmholtz (1890, p. ^55). 
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we write 

(dsf = idsy + (drjy + (d^^ 

It IS m the logarithmic 3*space for rj, t, that minimal sensation differences 
correspond to line-elements within the space 
The Helmholtz line-element approach has been presented in this much 
detail to make clear the extent to which the mathematical treatment 
depends on specific hypotheses of the underlying theory and to specify 
the kinds of data actually incorporated in the particular solution developed 
Helmholtz’s solution was not accepted largely for reasons related to 
some widely held convictions about the visual mechanism One of the 
primary objections to it rested upon an a prion notion that fundamental 
spectral distribution functions, of which Helmholtz’s x, y, z are an example* 
must describe the absorption spectra of the three selective photochemicals 
assumed to be present m the retinal cones The broad, double-humped 
functions (Fig 6) that resulted from Helmholtz’s solution were considered 
unlikely candidates to represent the spectral absorptions of the visual 
photochemicals Helmholtz’s curves are also difficult to reconcile with the 
form of the spectral luminosity function (Schrodinger, 1920) 


3 3 Smden’s Reevaluation of the Line-Element 

Sinden’s (1937, 1938) attempt to improve on Helmholtz’s solution was 
motivated by the idea that the approach was promising but that Helm- 
holtz had been led far afield by the deficient data that were available to 
him Sinden was convinced, furthermore, that the arithmetical tnal-and- 
error procedure used by Helmholtz in solving for his basic excitation values 
involved an unrealistic amount of numerical computation and an m- 
. available experimental data Sinden himself therefore 

started from the geometry of color space, defined as a three-dimensional 
rectangular coordinate system in which the coordinates correspond to 
i 7 >ng amounts of three pnmary colors The locus of colors of like 

mtens lv^T ? the origin (zero 

to H T The direction of the line with respect 

orilm f chromaticity. and the distance between the 

noint Thf nJ defines the excitation intensity at that 

natural nnnnn^ csignated by x, y, and z, are assumed to represent the 
real colLs nri- ’ T of 'he color sense mechanism, and all 

system The positive octant of the geometrical 

cLtame Jm represented by a group of straight lines 

positive octant that form a cone-shaped open surface 
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with apex at the ongin Accepting both Helmholtz’s quadratic form of 
sensation difference equation and the Fechnerian relation between 
excitation strength and sensation magnitude, Sinden’s expression for 
color difference between equally bright colors is 

Sinden devised an ingeniously constructed projection apparatus to 
project the image of a spectral locus plotted in a color triangle based on 
three spectral mixture primaries onto a loganthmic triangular coordinate 
system outlined on a transparent projection screen By marking off on the 
spectral locus the varying extents corresponding to experimental measures 
of minimally discriminable wavelength differences, it was possible to ma- 
nipulate the projection orientation of the slide containing the spectral 
locus in order to yield approximately equal distances for the measured 
AA extents on the projected image in logarithmic coordinates The geo- 
metric transformation relations between the linear coordinates of the 
stimulus space and the logarithmic coordinates of the projected space 
were determined approximately by direct measurement m the projection 
apparatus, and the approximate solution made more precise by numerical 
calculations 

Although Sinden’s transformation yielded a metric that he considered 
to be an improvement over Helmholtz’s solution with respect to data for 
wavelength discnminalion and colonmctnc punty discrimination, the 
solution yielded a result that contradicted one of the fundamental as- 
sumptions of the theory on which it was based the luminosity of one of 
the three pnmary colors was found to be negative by Sinden Although 
he saw that such a result might be interpreted to mean that there is an 
independence between the mechanisms tor bnghtness and for hue per- 
ceptions, he refrained from making this interpretation since it represents 
a radical departure from the classical theory of color vision on which the 
approach was based 


3 4 SUlcs Increment Threshold and the Line-Elcmcnt 

Contemporary interest m the line-clcmcnl is perhaps best represented by 
theworkofW S Stiles (1946, 1949a. 1959) that was started in the 1940 s 
and continues to the present His atm has been to den>c the fundamental 
distnbution functions for Msion from dtscnmmation data obtained m 
what he has described as “increment threshold ’ ctpenments 

In the standard Msual intensity discrimination cxpenmeni. light of a 
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given wavelength or wavelength distribution is used and the measures 
represent the minimal difference in amount of such light that an observer 
can detect with a given probability If Weber’s law were to hold exactly, 
then the discriminable intensity increment would represent a constant 
fraction of the background intensity A/ = kl The form of the function 
typically found in experiments of this sort is, approximately, log A/ — 
k log (7 + c) Suppose, however, that the wavelength of the background 
light IS kept constant and that the intensity increments are measured for a 
series of lights that differ in wavelength from that of the background 
stimulus If the eye is equally sensitive to all such wavelengths, then the 
form of the discrimination function will presumably remain unchanged, 
and the value of the A/ increment for any given background intensity will 
also be the same If the eye is not equally sensitive to all wavelengths, then 
we cannot expect the same A7 values from experiments m which different 
wavelengths of background and test stimuli are used for the increment 


threshold measures Stiles’ first and basic assumption is that the form of 
the function will be the same for all such wavelengths, but that for test 
stimuli of different wavelengths the different functions will be displaced in a 
direction parallel to the log A7 ordinate Thus Stiles anticipated a family 
of parallel functions from such a series of experiments, with the function 
for the increment threshold wavelength to which the eye is least sensitive 
showing maximal upward displacement He assumed further that the 
amount of this upward displacement could be used as a measure of the 
relative sensitivity to the different wavelengths of the increment stimuli 
For the corollary series of experiments, in which the test stimulus 
wavelength is maintained constant, but the wavelength of the background 
stimulus IS changed from one experiment to the next, Stiles anticipated a 
second family of functions displaced relative to each other in a direction 
parallel to the abscissa In this case, the function for the background to 
which the eye is least sensitive would be displaced maximally to the left, 
and again Stiles assumed that the degree of displacement would serve as a 
measure of the relative sensitivity of the eye to the different background 
ave cngt s the form invariance and displacement assumptions are 
correct, then the sensitivity distribution functions derived by the dis- 
placement measures from the two senes of expenments should coincide 

the l iT^' '™“s tictcrmincd by the precision of the data from 

the different experiments ^ 

oil'',';™®''' Ihreshold analysis is extended directly to the 

selcciivrrl."„ a ' ''' not with a single spectrally 

Sunnfw th ? istribution, but rather with three such distributions 
lest xi?,^, 1 r‘," r” '»S '"cegy of the least discriminable 

stimulus U of wavelength } is measured as a function of log energy of 
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Tig 8 Illustrative increment threshold function with three component mechanisms 
Log test field increment (A/;i) plotted against log surround intensity (/„) Adapted with 
permission from Stiles (1949a, p 158) 

the background field W of wavelength Suppose further that the data 
are described by the function shown in Fig 8, which is apparently made up 
of three component functions a, by and c The analysis assumes that 
there is a single function of the form shown in Fig 9, log $(*) Each of the 
curve components n, by and c is assumed to be a segment of this common 
function The amount by which the common function must be displaced 
parallel to the ordinate to fit the a, b, and c segments, respectively, is a 
measure of the relative sensitivities of the three assumed component 
mechanisms 10 the lest wavelength A The displacements parallel to the 
abscissa that are required to fit the same common function to the three 



logx 


Fig 9 Sul«’ ihcorrucal "common component jhowjng ihr v*a> m >»hich log 

incrcmml ihrcihold \-an« wjih log backgroond for a color rsccharuitn 

Adapted wiiJi permuuon from Siilrt (I*l|9b, p 221) 
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cZponem Th"‘' ‘‘ "’“f ' sens.t,vmes of the three 

stewnm^heH^fi '''' wavelength ^ The curve 

to a common runn“rn‘dcr°«frrom"c'’"''’'“°"‘" 
number of increment threshold fim?. ^ '’^^y '“^8= 

The snecinl Hict k f measured for his own eye 

Sides m^w ?rom fundamental mechanisms derived by 

( 949bT fthcseTr ^huw" in Fig 10 

formula /or the mmimTc^ir/t'''* 

dement in uniform color space ^me"s“ ‘h' 

■''-[7«J*[7«bIi [?!«]■ 
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Here, p, y, and ^ are quantities proportional to the limiting Weber 
fractions of the three component mechanisms, which are understood y 

“dir one from the other The limiting Weber fraction. s greatest 

(and thus difference sensitivity is least) for the B mechanis , 

and R mechanisms have limiting fractions that are nearly equal, although 

the G mechanism is slightly more sensitive than the R one 

The essential form of the color difference formula used by Stiles l 

so far represents his work as reported , , assumption that the 

The Stiles approach depends on ‘h' component 
increment thresholds are determm „ are invariant in form 

mechanisms whose spectral sensitivity 1 wavelength and 

Adaptation to a „f sens.t.vdy of 

mtensitycanpresumablyanecttneievciu , This assumption 

but not the form of Its spectral sensitivity d shibun™ 

IS frequently made in analjws ^tems from the early 

formulation as a three-variable relative sensitivity levels of 

work of von Kr.es (1905) It states tha the elauve 

three spectrally relative sensitivities to that 

adapting light in proportion 

ligth .hreshold expenments dunng the past 

Stiles continued his incremen extended range of stimulus 

ten years to include other be accounted for by three such 

variables, and he found that the nvanant set of displacement rules 

invariant distributions that '"^“m to be open for further 

A number of avenues “he assumption 

exploration. The data could something other than (or in addition 

IS untenable, they could produced by the background stimulus, 

to) a multiplicative change in ^vel is P M ^ ,hrce antiCT- 

or that the assumed invariances h • , ,|„rd possibility seems 

paled component mechanisms are involved 
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to be the only one that Stiles has senously considered, and his more recent 
papers (1953, 1959) report the emergeni^ of not three, but at least five and 
perhaps seven different component sensitivity distributions Stiles himself 
IS not yet clear on what the implications of these multiple sensitivity 
distributions may be for the mechanism of color vision, nor is it yet 
obvious what modifications they will require m the further development of 
1959)”^ approach to the problem of uniform color space (Graham, 

Both the Helmholtz and the Stiles approaches require the use of data 
that permit analysis in terms of basic sensitivity or excitation distributions 
The color space is then built on these distributions as foundation, and the 
assumption of a specific formula for minimal color differences defines the 
ine-eleraent of the space A different approach, but again one that 
develops a color space on the basis of minimally discrimmable color 
differences, is illustrated by MacAdam’s work 


3 5 MacAdam Discrimination Ellipses 

1 set out to determine experimentally the discrimination 

n ^ constant luminance plane of CIE chromaticity space, and later, 

1949), to extend the measures to include 
dici-r “"’“'Once and chromaticity variations The chromaticity 
^crimination measures that he used were standard deviations of eolor- 
usinv ^ ^ n t e first set of experiments, all color matches were made 
tarn ■"■«ur=-st,mul. that were chosen so that they were of 
match to in T niixture proportions, and so that when a color 

havf.den !-,l .‘Tf mixture-stimuli could 

to obtain*^ color , u i ^ "“Xture-stimuh were used 

¥he se e„o "■=“ P"'"* ■" «'o "hromaticity space 
when Lei orono t f "f the ixfures 

fecautn ZcT:T r”-- " “‘rue” color match is important 

individuals If the ‘ “enter at the same chromaticity for all 

Ltr bur Lns of rt f (‘hat is, different spectral 

eolo7ap~Lr M ^ """d'tion of identity m 

the assumption of*a ' ' differences in color vision would preclude 
tnejissumption of a unique chromaticity center as the “true” color match 

encTonhre'nTnt^dLf ‘'■'romatieity centers C, and for 

several (5 to 9) stnucht^[iw*"^ standard deviation of matches along 
to to 91 straight lines radiating from C. The standard deviations 
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Fig 1 1 Djscnmmation ellipses in CIE chromaticity space Radii of ellipses shown are 
ten times the experimentally measured standard donations Adapted with permission 
from MacAdam (1949, p 228} 

were based on 50 matches with each binary sttmulus combinatjon Mac- 
Adam has fitted ellipses to the discrimination loci, and Fig 11 shows 
some of these ellipses and the variation in their dimensions and their 
orientations from one position of C to another 
On the assumption that color space has a Riemanman metric, the 
elementary distance ds between two points whose chromaticity coordinates 
are {x, y) and (ar ■+• dx,y + dy) can be wntten in positive quadratic form 

- ^11 + 2gn + Sti df, 

where MacAdam’s ctpenmcntal data, determined in the CIE 

ar, y chromaticity space for a plane of constant luminance, are represented 
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as ellipses and are described by the equation (MacAdam, 1949 ) 

^11 Aa:® + 2^12 Aa: Ay 4- ^22 Ay^ = 1 . 

To convert any given ellipse m a given region of the chromaticity diagram 
to a circle in which minimally discriminable color differences would be 
represented by equal distances from the center color C, the coordinate 
network m the region is redrawn with the angle between the x and y 
coordinate directions given by 


cos 0 } = — 


The units of length along the new x and y directions are proportional 
(^11)^ and (^22)*'^, respectively MacAdam has worked out such a 
solution for his data for different regions of the chromaticity diagram, 
limiting each region to a space within which the g values do not differ by 
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more than about 25% The total surface is then reconstructed by joining 
together the part surfaces for which individual solutions have been 
obtained 

Geodesics of the resulting curved surface should represent senes of 
equilummous colors that include the least number of just noticeable color 
steps between any two colors Figure 12 shows a number of such geodesics 
plotted on the CIE chromaticity diagram between the chromaticity 
locus representing the CIE standard illuminant C (artificial daylight) and 
the chromaticities of a number of spectral wavelengths Such geodesics 
presumably represent senes of colors that are constant in brightness and 
constant in hue and that vary only along the saturation dimension 

It IS important to note that the uniform chromaticity space developed 
by MacAdam depends on the assumption that the chromaticities of the 
color matches can be described by a normal probability distribution in 
two dimensions, from which the Riemannian metric follows We shall 
examine the MacAdam experiments more critically later, but it may be 
noted here that, although the normality assumption is difficult either to 
prove or to disprove, LeGrand (1957), for instance, considers it very 
unlikely LeGrand proposes as an alternative the hypothesis of a normal 
probability distribution for each of three receptor excitations (or, near 
the absolute threshold, a Poisson distnbution because of quantum light 
fluctuations) LeGrand points out that if the elementary difference is not 
a Riemannian one, the curvatures and other geometrical parameters 
become only approximate, and there is no a prion reason for rejecting any 
form of color difference equation 


3 6 Munsell Notation 

A thoroughly different approach to the problem of establishing a per- 
ceptually meaningful ordering for color space is illustrated by the 
notational system for matenal color samples developed by A H Munsell 
(1905, 1912, 1941) The starting point here was neither visual theory nor 
color metrics It was, rather, an interest m developing a simple, precise, and 
logical symbolic color language that would serve for the painter much the 
same function that musical notation does for the musician Development 
of the system for practical use required the production of matenal reference 
samples that are ordered in a perceptually meaningful way The differences 
between samples to be ordered are not the minimal differences of dis- 
cnmination measures but steps that are many jnd s apart The spacing 
in this system is determined by a combination of some rather general 
intuitive notions about the rational basis of perceptual color ordenngs and 
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adjustments based on observations of various sorts Although the system 
was originally developed primarily for artists and for use in painting 
instruction, the carefully controlled, specified, stable, and reproducible 
Munsell color samples have proved highly valuable also m scientific 
laboratories and commercial color control problems (Judd & Wyszecki, 
1963) The Munsell system is, furthermore, frequently referred to as a 
standard for uniform color spacing (Burnham, 1949, Torgerson, 1958) 
In this system, the geometric form of the perceptual color solid was 
assumed a prion to be a sphere The continuum of achromatic colors is 
ordered along the vertical axis with midgray at the center and with samples 
of increasing reflectance (= increasing “value”) in the upper half of the 
sphere and those of decreasing reflectance (= decreasing “value”) in the 
lower half The spacing (on a scale from 1 to 10) of the value ordinate was 
originally a logarithmic function of luminance in accordance with the 
Fechnerian hypothesis, but it was later modified on the basis of observations 
to a power function of luminance The spacing in the present system is a 
cube root function 


The notion of color balance was at the heart of the original Munsell 
ordering Thus any two “neutral” {N) or achromatic samples at equal 
distances m the two directions from the midvalue sample presumably 
balance to the midgray (NfS) when taken m equal amounts on a color 
wheel The same principle is basic to the ordering of the chromatic 
samples within the color sphere Five principal “hues” (/?, Y, G, B, P) 
are arranged along the radii of each circular value plane perpendicular to 
the vertical axis so that they too balance to the achromatic center of the 
circle when taken in equal amounts on a color wheel Principal colors of 
equal value balance to the same value of achromatic sample, and so on 
Constant hue samples of varying saturation or “chroma” are ordered 
along each radius, with color balance again serving as the criterion for 
uniformity of spacing Thus the original Munsell system was based on the 
“'I samples that fulfill the requirements of 

^ Pnon to characterize 

within T requirements when ordered 

a prion ^ ^ whose geometric properties were also assumed 

sioIslSu!'?' been reevaluated for constancy of specified dimen- 
conUnu7of colnr “n'formtty along the hue, value, and chroma 

notation (NcwhallctaT^943^Thet!!■® T numerical 

visual scientists M-Tc L ’ ^ ' Jbercevaluation by a group of experienced 
spa^d n th^r observations of the samples os 

m "color orirt f wore reported as 

e sectors of sample reloeattoas required to satisfy the perceptual cr.lena 
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It IS a tribute to A H. Munselt’s analytic and observational powers that 
the scalingjudgments in this reevaluation showed as little need for revision 
of the spacing as they did The material samples produced for the Munsell 
Book of Colors are now specified both in terms of Munsell’s hue /value/ 
chroma system of notation and in terms of the CIE system of stimulus 
specification (Kelly et al , 1943) and the spacing is frequently compared m 
the latter system with color-difference formulas based on quite different 
assumptions and data (Judd & Wyszecki, 1963) 

It seems obvious, however, that any color space that is tied to a fixed 
set of material color samples cannot have the same perceptual charac- 
tenstics for a variety of different illuminations and viewing conditions If 
such a color space were invariant in this sense, then we should have 
complete object color constancy, which is not, m fact, the case To cite only 
one example, two Munsell “neutral” samples that differ only m Munsell 
value (lightness) may indeed appear to be grays of different brightness when 
viewed against a “neutral” background under some specified quality of 
daylight illumination When viewed under a chromatic illumination, the 
same material samples on the same material background may appear 
(1) one as neutral and the other as the hue of the illummant, (2) one as 
neutral and the other as a hue complementary to that of the illummant, or 
(3) one as the illummant hue and the other as its complementary, depending 
on the respective values (and thus reflectances) of the two “neutral” 
samples and the “neutral” background (Helson, 1938) The usefulness of 
a material sample color space is consequently limited to a limited range of 
viewing conditions, and both the spacmgand the nolalional system become 
progressively more questionable with increasing departures of the viewing 
conditions from those established as “standard” for the system 


3 7 Opponent-Process Model 

In the approaches toward development of perceptual color space that 
have been considered thus far, assumptions about visual processes have 
been limited to characteristics presumably required to account for the 
data of color mixture and of specific discrimination functions We shall 
now consider a process theory approach that aims to (I) provide testable 
relations between stimulus variables and apparent hue, brightness, and 
saturation, relations that are, at the same time, consistent with the stimulus 
equivalence relations of color matching, (2) encompass some of the 
dependencies of spectral functions on stimulus luminance, including both 
the color appearance relations and the spectral discrimination data, and 
(3) account for some of the changes that occur m apparent color and color 



136 THEORETICAL TREATMENTS OF SELECTED VISUAL PROBLEMS 

discrimination when the general “field” conditions of visual stimulation 
are altered. 

We may start with a general definition of perceived color which states 
that m the total visual field, the perceived color of any stimulus element, 
considered as a discrete element in time and space, is a three-variable 
complex of responses in the visual system This complex of responses 
involves a triple function of the excitation products of the light stimulus 
energies at different wavelengths times the relative sensitivities of the 
three processes of different spectral sensitivities, together with the activities 
engendered by the occurrence of other visual response activities that are 
closely related in time and/or space The perceived color relation is 
expressed as a three-dimensional vector equation 


C =/[I.(e.XA + I, 

where f= and I =: (/,, /g, /g) Here C is a three-dimensional 

vector representing the perceived color, ex the energy distribution of the 
test stimulus throughout the visible spectrum, X^, T;, and 0;i, the linear 
wavelength distributions of three vanables of the visual system for a 
stimulus of unit energy at all wavelengths, / is a three-dimensional 
vector-valued function of the sums throughout the spectrum of the 
products contained in the expression, and / is a vector representing the 
incremental activity, not caused by the focal stimulation per se, but 
c ^ related visual response activities (Jameson & Hurvich, 
1959) Specification of the three separate attributes of the perceived color 
requires, of course, the further statement of three specific relations /i, /s. 

{a 1 variables Similarly, the induced activity / 

r *^r thought of as composed of three separate components 

A, and /, In this definition, a general restriction on the forms of the 
hree generalized functions X, Y, and £l is that they should be linear 
transforms of color-mixture data expressed m terms of relative amounts 
three stimulus variables required for complete spectral color equations 
tnn process theory under discussion, the linear 

195^ r f (Humch & Jameson. 1955, 

have a “ “a positive functions of wavelength, which we 

abloitmns 0 “' I"'* represent the tn.tial light 

selXhr«mnen n®'“a trLsforms is a 
ness of the Lumi’ functions that represent the relative responsive- 

a ordance H stimulation in.Lted m 

S bild ^ f P“‘''"-= 'funsforms It is with these 

the proDosed coirl'l'?'^ fransforms that we are primanly concerned, since 
proposed correlation with percetved color is with respect to response 
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Pig 13 Expenmcntally measured chromatic and achromatic response distribution 
functions for one observer 


functions of these distributions The spectral distnbution functions are 
shown in Fig 13 (Jameson & Hurvich, 1955) 

These are expenmental measures for one individual The “white” 
function IS taken as the spectral luminosity distribution, the reciprocal of 
the threshold energy measures for photopic conditions The distributions 
of each of the chromatic functions were measured for an equal bnghtness 
spectrum by a null technique for hue cancellation Given each of a senes 
of test stimuli of fixed energy Cf and wavelength ^ from a region of the 
spectrum (400 to 500 mfi) that elicits blue hue responses (red-blue, green- 
blue, blue), determinations were made of the vanable energy of a 
cancellation stimulus of fixed wavelength A, (say, 580 m/i) selected from 
a region of the spectrum that chats yellow hue responses (red-yellow, 
green-yellow, or yellow) when the latter is mixed with the test stimulus m 
proportions such that the additive light mixture is perceived as “neither 
blue nor yellow” m hue The same entenon is used for a senes of yellow 
test stimuli (500 to 700 m/i) and a fixed wavelength (say 470 m^) of blue 
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cancellation stimulus, and two additional senes for red and green hue 
cancellation complete the measures of relative hue strength on which the 
chromatic distribution functions are based 
The experimental distributions measured in this way for an individual 
are consistent with the following transforms of the CIE tristimulus distri- 
butions for average color-mixture data 


gx~ — myx, 

Vx — = npx — nzx, 

Wi - bkx = ppx 

The bimodal, positive and negative, characteristics are assumed to represent 
physiological processes that are opposed in nature, and they are required 
to account for the mutual exclusiveness of the associated hue qualities m 
the color perception The distributions shown are taken to represent the 
responsiveness of each of the three paired variables and they refer to a 
set of viewing conditions such that there is no incremental activity I from 
associated responses caused by directly preceding, or simultaneous but 
spatially separate, stimulation in other parts of the visual field For this 
reason the distribution of opposite sign is not shown for the second member 
of the achromatic white-black pair, opposed achromatic (blackness) 
achvity IS assumed to result exclusively from induced responses 
The linear transformation equations that relate these responses to 
color-mixture data in tristimulus units specify the psychophysical relations 
between stimulus and responsiveness in terms of the three independent 
pairs of response variables white-black, red green, and yellow-blue 
Relations are also stated to specify the perceived color m terms of the three 
psychological attributes of brightness, hue, and saturation 


B = w — bk. 


Hr- 


!'• - gl 




I'' - Xl + |y - 6| ■ 

ll- - xl + Iv - h\ 


\y-b\ 


I'' — xl + |y — ’ 


!'■ - xl + Is - 61 + |1V _ bk\ 

^ “S proportional to the net response 

rcd-crcen svitem relates the net response r — g- in the paired 

and vcIlowb!nr/V^^*°*? of the chromatic responses in the red-green 
the aonrooHMe f '“'‘r"' “’S'"’" 

sponsS? ^ ‘""S™'''* Product of energy t.mes re- 

P The hue is cither red or green and/or yellow or blue m 
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accordance with the sign convention adopted for the particular components 
of the paired systems Saturation S is also expressed as a percentage 
relating the summed net responses of the red-green and yellow-blue 
chromatic pairs to the sum of the chromatic responses plus that of the 
achromatic white-black system. It will be noted that this expression for 
saturation explicitly incorporates saturation changes produced by inter- 
mixture with blackness, which is an important aspect of perceived satura- 
tion that IS almost always lost sight of in theoretical systems that deal 
With the more limited stimulus variables rather than with the response 
variables per se 

With relations between stimulus variables and perceived color variables 
expressible in this quantitative fashion, it becomes clear that this kind of 
approach can be used to provide the basis for a systematic color metric 
(Hurvich & Jameson, 1956) In such a system, color differences along any 
one dimension of perceived color are assumed to be uniform That is, a 
unit percentage difference in saturation represents the same saturation 
difference whatever the hue and whatever the brightness Similarly, a 
given percentage difference in hue has the same psychological significance 
whatever the perceived saturation and brightness of the color, and so on 
With respect to differences m perceived color that involve combined 
differences in more than a single psychological dimension, the model 
assumes simple additive combinations of hue and saturalion differences, 
with equal weighting of the differences for the different attributes AC * 
Afl" + AS when B is constant This most simple assumption has yielded 
rather good agreement with data from wavelength discrimination experi- 
ments where apparent brightness is constant but hue and saturation vary 
concomitantly (Hurvich & Jameson, 1955) 

Even m the absence of any variation in pre-exposure or surround 
stimulation, the stimulus spacing that corresponds to the perceived hue and 
saturation space for constant brightness shows a strong dependence on 
the level of constant brightness. The three functions relating response 
magnitude to stimulus magnitude arc assumed to differ among the red- 
green, yellow-blue, and white-black paired response systems, and thus not 
only the brightness but also the hue and saturation attributes of the color 
evoked by a stimulus of fixed spectral distribution all vaiy systematically 
with the energy level of the given stimulus This assumption of the theory 
IS consistent with the empirical data, that is, the Bezold-Brucke spectral 
hue shifts, the inverted CAshaped functions relating spectral saturation to 
luminance at any wavelength, and the changing forms of the wavelength 
discrimination function for different levels of spectral luminance 

Within the size limits of small stimulus areas for which there is summa- 
tion or some degree of area-intensity reciprocity when stimulus size is 
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reduced, the perceived color space undergoes changes that are comparable 
to those associated with reduction m stimulus energy This assumption of 
the theory conforms to the observed color appearance and discrimination 
changes usually described as the phenomena of small field tritanopia, or 
sometimes by the misnomer of “foveal tritanopia” (Farnsworth, 1955, 
Hartndge, 1945, Hurvich & Jameson, 1958) The name tritanopia is used 
to categorize a very rare type of color defect m which yellow-blue hue 
discriminations are lost in contrast to the much more common red-green 
losses that occur in the usual forms of color blindness or color deficiency 
In normal color vision, the term is applied to the situation that prevails 
with very small stimulus fields, but not to the directly comparable state of 
affairs found with reasonably large stimulus areas but very low levels of 
stimulus energy This inconsistency in terminology derives from theoret- 
ical assumptions of no more than historical interest (Konig, 1894), and 
hence the matter will not be pursued further here 
In an early statement of the opponent-process theory, the precise 
forms of the three functions that express the three different dependencies 
of response strength on stimulus luminance were not specified However, 
the differential rates were expressed by intensity-dependent multiplicative 
factors whose numerical values differed for the three paired response 
sy^ems, and whose ratios to one another differed systematically for 
different energy levels The available evidence, especially for brightness 
relations controlled by the achromatic white-black system, suggests 
functions of the form 

/v-6 = {J.X iexVx - 

■fr-9 ~ [Za — eAgA)]"*. 

than powers, and is known to be greater 

r^**”*^ values for the three paired systems are also assumed to 
differ one from the other, and in a way such that 

legenemlcr Perceived color space 

threshoTdLlu J “'^'-tomatic dimension, and at near- 

o or sna« oc "'’f ' dichromatic The achromatic axis of the 

color space occurs along the locus of values where r -f =0 

me"mbl^s of id{nt'cal {or both 

S t™ IS not nS ? a “^^h^niatic balance of the 

n" vcl of brSh^J ^ “s central locus At any 

g level of brightness, however, the achromatic zone around the central 
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axis of achromatic balance varies in shape and area in stimulus space 
Thus, in perceived color space, all stimuli, including monochromatic 
spectral wavelengths, are located at the achromatic center at the absolute 
photopic threshold At near-threshold levels only slightly above the 
absolute threshold, only a small compass of weakly saturated red and green 
hues is contained m the perceived color space The total hue gamut, with 
maximal saturations for the given viewing conditions, occurs at an inter- 
mediate level of photopic luminance As the luminance is increased to very 
high levels, the hue gamut becomes progressively restricted again because 
of the differential response rates among the three paired systems But now 
the restriction occurs in the compass of red and green hues At extremely 
high levels, yellow and blue hues also become strongly desaturated and 
eventually all stimuli again approach the achromatic axis in perceived color 
space This description of perceived color space is somewhat reminiscent 
of diagrams of the color solid commonly used m textbook illustrations, 
although the latter do not represent the different paired hue changes as 
described here The rough similarity is hardly surprising since these 
illustrations are intended to descnbe qualitatively the same phenomena 
that are handled here in terms of process characteristics 
With the focal stimulus unchanged, the locus of any color stimulus 
m perceived color space is further modified when the field conditions are 
changed by the introduction of surrounding or pre exposure stimuli 
Under these circumstances, the expression / in the basic definition of 
perceived color given previously must be evaluated for each different set 
of Viewing conditions In general, for the surround situation, each of the 
induced responses Is 2 3 is proportional to but opponent to the correspond- 
ing response i?, in the inducing (surround) area (Jameson & Hurvich, 
1959, 1961b) For each of the three paired systems, the induced responses 
are evaluated in terms of sets of simultaneous equations of the following 


form 


- fcR., 




In the set of equations, Rf denotes the response of one of the three paired 
response systems m the area of the visual field designated as focal and 
i?,, the corresponding response in the area treated as surround Sf and S, 
represent the values of the stimuli impinging on the two areas, respectively, 
and/ IS a nonlinear function of the appropriate linear stimulus transform 
The induced responses arc contained in the incremental expressions 
^kR, and -A/?, and, as the constant k indicates, the induced activity is 
proportional to the response activity m each of the different related visual 
areas The interactions occur between or among responses, and not 
between or among stimuli The induced responses could be linearly related 
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to the stimulus values only if the mutual interactions were assumed to occur 
at the linear, that is, light reception, stage of the visual mechanism 
Analysis of data from a variety of brightness constancy and contrast 
experiments supports the following formulation for induced responses in 
the achromatic, white-black system (Jameson & Hurvich, 1961a, 1964) 

R, = L^ - kR, 

Here the stimulus measures are in luminance units, and the response 
function / is the cube root 

One of the most striking features of contrast interactions, both as 
observed in experiments and as derived from the formulation discussed 
here, is the change in perceived color distance that occurs when arrays 
of stimuli that are not perceived as very different when viewed individually 
are viewed simultaneously (Hurvich & Jameson, I960, 1961) A simple 
effect of this sort is the conversion of a brightness magnitude scale from a 
simple cube root function of luminance to a curvilinear function of the 
form 


y a= AX’* — 6 

when the test stimuli are exposed in the presence of an illuminated 
surround (Jameson & Hurvich, 1959, 1964) 

An illustration of increased perceived color gamut and expansion of 
color differences is given in Fig 14 The points plotted as open circles 
represent the colors of a senes of equally bright stimuli, all of which fall m 
the red-yellow quadrant of the perceived hue gamut when viewed indi- 
vidually For these perceptions there are no induced responses If the 
same stimuli are viewed simultaneously (points plotted as filled circles), 
we may, as a rough approximation, consider each to be surrounded by an 
average color that would have associated responses (without induction) 
determined by the mean of the isolated color responses to the remaining 
timuU in the field If these mean values for each of the three paired 
variables of the response system are represented by r,(= -g,), y,(== 

^ individual areas by r„ y„ w,, then the 

individual response values with induction, r/, y/, u/,ico4 

/ _ — fcr. 


y/ = 


i-fc* ’ 
yf ~ 
i-k* ’ 


w/ = 

1 - I' 



Q.UAL1TATIVE VARIATIONS OF STIMULATION 

For the illustration in Fig 14, the value of the induction coefBcient k has 
been set equal to 0 2 for all three response systems (Jameson & Hurvich, 
I96Ib, 1964), and the colors are expressed in hue and saturation terms for 
a plane of constant brightness (see above) 

In the real situation the spatial distnbution of the colors m the array 
would influence the induction strength (the magmtude expressed by k) in 
terms of contiguities and separations, sizes, and retinal locations The 
perceptions may also be influenced by possible dependences on retinal 
location of the functions/,, t 1 , 2, 3, m the basic definition of perceived 
color given at the beginning of this section The model, as developed to 
date, does not make any statements about this, and these are among the 
variables that require further experimental study 



Fig 14 Effect of contrast on percened color gamut Open circles represent each of fisc 
equally bright stimuli seen in isolation Note restncuon to r — y quadrant filled 
circles represent expanded gamut produced by mutual interactions Plot is m polar 
coordinates v*hcte hue \-anes circumferentially and saturation increases radiall) from 


center 
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A process theory of this sort can, of course, look to physiological data 
to support its basic hypotheses Such evidence is available but has not 
been included in this account since this sort of approach does not depend 
on a knowledge of the physiology but is based rather on inferences from a 
wide variety of psychophysical data 


3 8 Some Charactenstics of the Psychophysical Expenments 

Many experiments in color vision present special problems for measure- 
ment theory This is so m some instances because of the stimulus controls 
and methods used and in others because of the kinds of judgmental 
criteria that are involved Criterion responses in color experiments 
frequently require the abstraction of a single perceptual attribute This is 
sometimes done by using some indirect, but unambiguous, cntenon to 
define the perceptual endpoint, and spectral luminosity determinations 
provide good examples of this sort Measurements of relative luminosity 
for an equal energy spectrum involve the determination of the energies 
required at a senes of spectral wavelengths to meet a specified criterion 
response (Wright, 1949) The cntenon is taken, either directly or indirectly, 
as an index of constant apparent bnghtness The measures may be 
determinations of the absolute photopic threshold, this criterion pre- 
sumably implies a uniformly minimal perceptible bnghtness level The 
measures may be determined by a flicker fusion technique in which a 
standard of fixed intensity and a test stimulus of variable intensity are 
a erna e at a constant rate, there the criterion of fusion is presumably 
an m ex o equality of bnghtness between alternating standard stimulus 
on wavelength Or the functions may be based 

used d,r,Mi"®u ".'.U Here the criterion of equal brightness is 

the uresence ’nfh ' "'"st be judged to be equal in brightness m 

maKetw rf (heterochromatic brightness 

oftWne^X me '“"’penson stimuli In all three instances, plots 

as mdcoenSni ‘*'P'"‘*ent vanable against stimulus wavelength 

coluT” Pe e, " 1 that describe “equal brightness 

spectral lummosUy'fhuLOTs'^The'n™'^' ''='‘‘'1''° 

different different measures obviously involve 

hie resultmrfutoL response criteria, and 

method to fhe next * ™'T’™ngly. somewhat different from one 

eompar,'o™hm„r‘°rT d'l wavelength of a 
test stimulus of fiTcH measured for a hue match to a 

stimulus of fixed wavelength at a series of different test luminances 
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Hence the measures depend on a judgment of equality m hue in the presence 
of bnghtness and saturation differences The measures of opponent 
chromatic responses discussed in the preceding section also depend on a 
specific hue criterion, that is, neither yellow nor blue or neither red nor 
green for the index response The appearance of the light mixture for the 
given index response is thus not constant, but vanes from one combination 
of test and cancellation stimuli to the next 

In all these instances, it is difiicult to ^arantee that the criterion response 
for one attnbute is not biased or confounded by variations in the other 
color attributes presumed to be irrelevant 

The same problem anses in connection with measures of discrimination 
thresholds There are two frequently used measures of spectral color 
discrimination, one that relates to discnminabihty along the spectrum 
locus, the other to the first dtscnminable step from a specified “white" 
point in the color-mixture space along each of a senes of lines radiating 
toward the spectrum locus Both these types of discnmination expenment 
also require the abstraction of one attribute of perceived color from the 
other two In the wavelength discrimination expenment, the aim is to 
determine that spectral stimulus, X-h or ? — AA, in one-half of a 
bipartite field that is just discnmmably different from the test wavelength 
A in the other half when there is no discnmmable brightness difference 
between the two stimuli In some procedures a judgment of “same” or 
“different" is required for discrete pairs of stimuli whose energies have 
been adjusted for equal luminance in terms of a separately determined 
spectral luminosity function This procedure makes the decision task a 
relatively easy one for the observer, but leaves a margin of uneertamty 
about the “no discnmmable difference in brightness” assumption An 
alternative to this procedure is to provide the observer with a control knob 
for varying the energy of the comparison wavelength The test and com- 
panson stimuli arc presented and the obsen'cr’s task is to attempt to 
produce a complete color match between the two wavelengths by adjusting 
the energy (and with it, the bnghtness) of the companson wavelength In 
this case the observer’s judgment is “match” or “no match," and the 
measures represent just discnmmable color differences that cannot be 
eliminated by the elimination of a brightness difference Such AA expen- 
ments arc sometimes described as measures of spectral hue discnmination, 
but the description is erroneous since equally bnghl spectral stimuli may 
differ from each other cither in hue or in saturation or, more often, m 
both hue and saturation 

The second kind of spectral discnmination measure relates more 
direcll> to the saturation attnbute of color These are the measures of 
leas! discnminable colonmctnc punt> Here, the aim is to determine the 
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luminance L of a test wavelength ? that must be intermixed with a “white” 
light of luminance to produce a just discriminable difference in color 
between this mixture and the “white” stimulus presented alone in the other 
half of a bipartite field when there is no discriminable difference in bright- 
ness between the two fields Again, the brightness attribute must be 
abstracted and made irrelevant to the discrimination, and the procedural 
problems posed by this requirement are essentially the same as those in the 
wavelength discrimination experiments 
In both types of experiments, not only must brightness differences 
be excluded between test and comparison stimuli in the two halves of the 
bipartite field but the brightness levels should be constant for all stimuli 
used to determine a given spectral function This is so because there is 
sufficient evidence to justify the expectation that the discrimination 
measures will vary with luminance level for any given spectral region as 
well as with wavelength for any given luminance level (Hurvich & 
Jameson. 1955. Purdy, 1929, Weale, 1951) The uniform spectral 
luminance level requirement is met for measures of purity or “saturation” 
discrimination where the standard “white” field can be kept constant 
throughout Frequently, however, it is not met in wavelength discrimina- 
tion experiments, largely because of instrumental limitations that provide 
only relatively low energy levels of the monochromatic bands in the short 
wave spectral regions (Wright & Pm, 1934) Ideally, both wavelength 
discrimination and punty discrimination should be measured and de- 
senbed in terms of families of spectral functions for a series of levels of 
uniform spectral luminance In any event, the luminance and wavelength 
dependencies should not be confounded in a single discnmmation function, 
and the parametric value for which the function was obtained should be 
reported as an essential datum 


Can we derive discnmmation measures from the spread of three- 
variable spectral color-matching data’ How are the errors distributed m 
these measurements? The average data for spectral color mixture are 

primaries to another Can we 
^7nf ^ measure of discrimination for the same expen- 

msiTnr.? transformed in the same way and with the same 

justification as mean color equations’ ^ 

<i'«nbut.on per se we might exara.ne the 
makmc SDecinl "'“^sity for desaturating the test stimuli when 
vanabfe "'=‘«hes m any three- 

l^ted 0^0 ■'‘'"“ty judgments between stimuli 

« unm n ‘ V" ‘h= 'oci of the three 

for the most mrt The locus of spectral stimuli falls, 

the most part, outside the limits of the mixture triangle, and hence their 
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mixture equivalents must be denved from matching data in which the test 
stimulus IS a composite made up of the spectral stimulus in question 
combined with some amount of one of the mixture pnmaries The actual 
locus of the lest stimulus in the mixture space consequently depends on the 
wavelength and energy of the spectral stimulus, the wavelength of the 
spectral primary used for desaturation, and the amount of this desaturating 
primary. Thus the test stimuli actually used in a spectral color-matching 
expenmcnl will, m general, depend on the mixture primaries, and the 
error measures will be specific for these binary test colors Hence the 
spread of the color match data determines the uncertainty limits of 
the spectral color equation, but obviously cannot serve as a measure of 
spectral color discrimination 

The error distributions of the matches for the actual test colors used 
also present some difficulties for analysis as discrimination measures In 
such experiments, the method of adjustment is used, and three separate 
adjustments must be made for each match The decision process is a 
continuous one as between '‘identity” or “nonidenlity,” and adjustments 
involve a combination of tnal-and-error manipulations, and learned 
associations between specific manipulations of the stimulus controls and 
specific c^or differences to be eliminated that are abstracted from the 
general nonidentity judgments Final settings usually involve bracketing 
between discnminable limits of the stimulus variations, and thus the nature 
of the instrumental control can influence the final match setting for which 
the observer tries to center each stimulus vanable within the interval 
of uncertainty or the range of acceptable settings (Stiles, 1955a, 1955b) 

In his redetermination of the color-mixture data to serve as a new 
international standard, Stiles tned to reduce the instrumental error that 
bracketing with a logarithmic stimulus control introduces He did this 

byaddingmoredesaturatingstimulustbannecessaryforpaiticu/armatches 

in order to bring the test color to a chromaticity for which the interval of 
uncertainty corresponds to a small range of stimulus variation along each 
of the three mixture-stimulus continua In so doing, Stiles’ concern was 
not with the effect of this source of error on the distribution of the individ- 
ual match settings, but with the possibility that it contnbuted an unknown 
constant error to the arithmetic mean of the data For metamenc color 
matches there is, of course, no way of evaluating such a constant error in 
terms of a color equation known to represent a "true” match, since the 
“accurate” metamenc color match depends in each instance on the color 
vision charactenstics of the individual observer 

We have mentioned earlier that in his determinations of discnmination 
ellipses for equilummous color matches, MacAdam (1942) selected paired 
filter pnmanes along each of a senes of intersecting hnes in the constant 
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luminance chromaticity plane, and the selection was such that the inter 
section locus represented, in each instance, the chromaticity of a mixture 
stimulus of very nearly the same spectral distribution as the standard 
This provides a locus of (approximate) stimulus identity which necessarily 
coincides with a “true ’ locus of perceived color identity for any individual 
observer MacAdam’s experiment also employed a method of adjustment, 
and the instrumental controls were such that for matches to a given color 
location along a single binary stimulus continuum, the variation from one 
endpoint to another was always accomplished by a rotation of 360° of a 
single knob If the measures along different continua are to be considered 
as free from instrumental bias related to the rate of change of the mixture 
proportions with respect to angular rotation of the knob, it should be 
demonstrated that the spread of the settings along a single continuum are 
independent of this rate of change MacAdam has made such checks 
(personal communication) The check data, however, were not included 
in his pubhshed report, and it is not clear from the report that the 
discrimination ellipses are necessarily free from such bias From the 
published data for the discrimmauon hraits and the chroraaticities of 
the binary mixture primaries, one can determine for all continua through 
a given standard color the correlation between the obtained distance 
that represents unit perceptual distance and the manual rate of change 
of the instrumental control We have calculated rank order correlations, 
based m each case on 6 to 9 pairs, for 25 MacAdam ellipses Of the 
25 correlations, 23 are nonnegauvc, and of these, 19 are above 0 40 
and 7 are above 0 80 Such a rough statistical check is, of course, not 
conclusive, but it does suggest that manual features of the adjustment 
introduced an instrumental bias in the discrimination 
measures described by the MacAdam ellipses 

MacAdam’s binary mixture experiments and the 

(1949tnrov of Brown and MacAdam 

of d! cCrnti' ‘he ^‘at.st.=al nature 

(Silberstein & MacAdam, 1945) When a 

prniianeTm a mMn' "T*' of th4e stimulus 

readings, one tnple for eacMna' 
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With the mode as a set of ooo 'o associate the standard color 

specified set of mixture or. ^ Thus, for a given observer and a 

one to-one corrcsoonde ^ instrumental stimulus controls, a 

to^ne correspondence is established between colors and triples of 
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real numbers Each triple may be denoted by a single letter, X, Y, etc , 

where each tnplecorrespondstoacolor. ifthe mixture stimuh and controls 

are changed, we may denote the triples as X , y , etc , wit an , 
y and Y', etc . corresponding to the same color in the original and 
altered setup, respectively If X is the standard Ihf 

tnvanate density function will be denoted/^, 

probability density of getting observation Y, for the unpnmed conditions, 

when the standard IS such that the modal observauon IS y 

It seems plausible to propose that the distance between the colors 
cor^e^onL’; to y and K is some monotonically de-easmg 
f fyi that reaches 0 at the maximum of /x* that is, sm > 

(fe 2 s“om y to Itself IS 0 _C.ear^, for such a definition to yield a 

metric, it is desirable that AW If ^ 
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If another simple assumption is made, namely, that the transformation 
from probability density to distance has the simple form ^(0 = —log t 
plus a constant, then it follows that the density function fx is approxi 
mately normal, and a^j is its covariant matrix Thus the theoretical 
argument is essentially that the normal distribution is a good candidate 
for the distributions /x since it is unchanged m form under linear trans- 
formations of coordinates, and therefore approximately invariant under 
arbitrary differentiable transformation of coordinates If the probability 
density is trivanate normal, approximately, then the metric, defined as 
minus log probability plus constant, is Riemannian 

This theory, of course, underlies the use of the contour ellipsoids 
obtained by free matching to estimate the values of the metric tensor in 
CIE coordinates by Brown and MacAdam (1949) for three variable 
matches, and by MacAdam (1942) for the ellipses that describe his two 
variable color-matching data The theory depends critically on the 
assumption that the density functions and/^ satisfy the invariance 
requirement The invariance assumption is subject to empirical test, and 
such tests would seem critical to the use of color-match data in this way 
as measures of minimal color distances 


3 9 Metric Scaling Model Based on Similarity Judgments 

To date, there has been no systematic attempt to obtain empirical 
measures of both small and large perceived color differences for comparable 
conditions of observation Such measurement is important for the 
development of a metric for a color space that represents the similarity 
and discnminabihty aspects of the color domain 
A metric space is a mathematical structure consisting of a set S and a 
lunction d which assigns to each pair, x, y, of elements of 5 a real number, 
denoted rf(x,y). such that the following axioms are satisfied (x, y, and z 
arc arbitrary members of S) 


(Ml) 

(M2) 

(M3) 

(M4) 


^(*.y) >. 0. 

</(x, y) =: 0 if and only if 
d{x, y) = d(y, x) 
d(x,y)-l.<f(y,j)^rf(j.^.j 


(symmetry), 

(triangle inequality). 


f '"'Educed ,n such u muthcmat.ca 

of the T mathcmutical tools arc ava.lablc for the anaijs. 

or the gcometneal properties of a metne space It ts hoped that th. 
simdanty aspects of the stimulus domain ultiLtely may be dtaracterizet 
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by means of geometrical invariants and, as a further step, that the 
systematic effects of relevant variables on color perception may be 
characterized by means of geometrical transformations Such an empirical 
development would impose specific restraints on theoretical models of 
the color vision system It would provide a test for theories such as 
Stiles’ hne-element and the opponent colors theories as well as for the 
various forms of color difference equations that have been proposed from 
time to time (Judd & Wyszecki, 1963) 

The work of MacAdam, Silberstein, and Brown discussed m Secs 3 5 
and 3 8 attempts to develop a metric representation of small color differ- 
ences Recent applications of multidimensional scaling to color problems 
can be found in Torgerson (1952), Messick (1956), Indow and Kanazawa 
(I960), and Shepard (1962a, f962b) These investigators have not, how- 
ever, attempted to encompass both minimal discriminability and large 
color differences 

Here, we outline a proposed metric scaling model that may provide a 
fruitful approach to the empirical measurement and representation of 
perceived color differences The corresponding experimental procedure 
entails comparisons of perceived stimulus differences 

One methodological question might be considered at the outset why 
make comparisons of stimulus differences rather than scale stimuli 
directly by magnitude estimation’ 

In the method of comparisons, the responses to repeated presentations 
of two stimulus pairs are assumed to be representable as a sequence of 
(independent) Bernoulli trials with fixed probability If this assumption is 
satisfied (see Bush Galanter, & Luce, 1963) then a large number of such 
presentations yields a fairly accurate estimate of the probability with which 
one pair is chosen as exhibiting the lesser difference In terms of this 
probability, we can define a measure that can be interpreted as an expres 
Sion of the relative distances exhibited by the two pairs With magnitude 
estimation, however, the mean or median of the numbers emitted by the 
subject to a pair of stimuli can be taken as the distance measure for that 
pair after relatively few stimulus presentations Although magnitude 
estimation has been used rather extensively to scale a variety of stimulus 
conlinua, it has not been used to estimate the very small differences 
required to subsume the measurement of discriminabihty as well as global 
similarity 

There is, however, a more fundamental reason for rejecting the seemingly 
more efficient method of magnitude estimation The numbers obtained by 
magnitude estimation need not satisfy the axioms MI through M4 and, 
m particular, one has no intuition, based on the structure of the magnitude 
estimation task, as to why or whether axiom M4, the triangle inequaliiy. 
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might or might not hold Although some transformation of the mag- 
nitude estimates rather than the estimates themselves might be the underly- 
ing metric, there is no a prion or theoretical reason for choosing any 
particular transformation Some restriction on the possible monotonic 
transformations might be obtainable from a measurement theory for 
magnitude estimation Although some investigators are convinced that 
magnitude estimation does yield “ratio scale measurement” (Stevens, 
1957), no measurement theory has been developed for magnitude estima- 
tion to justify fully this conviction (Luce & Galanter, 1963b) Another 
possibility for narrowing down the possible transformations of magnitude 
estimates is found in the work of Shephard (1962a, 1962b), who has 
devised a method for obtaining suitable monotonic transformations of 
proximity measures” to yield a metric His method requires, however, 
that the form of the geometry be assumed in advance to a very great 
extent, that is, it must be assumed to be Euclidean or some equally strong 
substitute 

method of paired comparisons, the situation is quite different 
e eve opment sketched below extends Luce’s choice analysis of simi- 
larity judgments (Luce, 1961, Luce & Galanter, 1963b) The choice axiom 
ot Luce (1959), which is directly testable, leads in the case of paired 
comparisons to a ratio scale v defined on pairs of stimuli, such that the 
probability P that the difference between ar and y m the pair (x, ij) is less 
than the difference between s and iv m the pair (z, w) is given by the 
following equation, provided that P is different from 0 and 1 


fife !/), (a, w)] = — 

y) + life, w) 


( 1 ) 


o an^.w?'^ S'm'lanty of and 

distant-f r monotonic decreasing function of it may be an underlying 
hot wh eh , “ *<i w.th Lgnttude eshma- 

measummen h" I" ‘he present case, the 

raho “ restr.ction the fact that n is a 

mg ta7y::;^“™m„h 

appreciable probaMitv th« '' *' 'h""' 'h"= 

(>■. y) dilTcrcLe as ‘h= subject will choose the 

for any third stimulus ' the h Then it is plausible that 

P tir,.) and fee). Thus, when Pl(r,y), (y,y)], „h.ch is always 
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^ i, becomes appreciably greater than 0, then for any z, F[{x, z), {y, z)] 
will also be greater than 0 and at least as close to J as Fl(x, y), (y, y)] 
That IS, the confusability between x and y with respect to a third stimulus 
z IS unlikely to be less than the confusability between x and y with respect 
to y Itself * Thus we can state as a plausible assumption 

y). (y. y)] :< ^((1, z), (y. z)] (2) 


If the probabilities in Inequality 2 are different from 0 and 1, then we may 
use Eq 1 to infer that 

'i<x,y) ^ c(x, z) 

rfy. y) ~ efy, z) 


Clearly, Inequality 3 is equivalent to 


, y/v(x, xX!/, u) , , , 

log ^ log - 

y) 




/v(x, x)v(z, z) 


Thus the transformation 


(i(z, y) 5* log 


y) 


yields a possible definition of a distance measure in terms of the y-scale 
It IS designed so that the triangle inequality M4 holds at least for x, y, and 
z that are quite similar to one another The other metne axioms are also 
satisfied by the function d, provided that we assume P[{x, y), (y, y)] ^ §• 
for all X, y and provided that stimuli x and y such that d{x, y) <= 0 will 
be identified as part of the same metamer class The distance is then the 
distance between metamer classes 


The admission of probabilities of 0 and 1 brings about interesting 
mathematical problems, but these can all be solved if the assumption of 
^ncqvahty 2 }S eMended a bjt further Informally; the required generaliza- 
tion IS as follows Let x, x% y, and z be four stimuli such that y is closer 
than z to both x and x’ We assume that the confusion of x and x', when 
their differences from y are compared, is less than when their differences 
from 2 are compared, that is, P[{x,y), (*',y)] is farther from ^ than is 
Pl{x, 2 ), {x\ z)] This reduces to Inequality 2 when a?' = y 

These assumptions about confusability, together with the basic choice 
axiom underlying r-scale measurement, can be tested directly by experi- 
ment If it turns out that these measurement assumptions are found to be 
tenable then we have a solid base for the experimental determination of 
perceived interstimulus distances that is independent of any a pnori 
notions about the nature of the underlying mechanisms, and even of the 


* A pnon percepiual analysis suggests that this assumption may not be applicable to 
exceptJonalTnstanccs of small perceptual distances inhere y lies betw-een x and 2 
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validity of any of the so-called laws of visual behavior. Thus, for example, 
the well-established laws of color mixture that specify the characteristics 
of metamerism need not be known a prion but would presumably emerge 
from an analysis of the experimentally determined choice probabilities 
The procedures are, moreover, ideally suited to determine, rather than 
assume, the relations between units of discriminability and units of 
perceived magnitude along stimulus continua 
The need for and importance of the application of sophisticated 
measurement theory and metnc ideas to the area of color vision do not 
imply that this kind of empirically determined metric necessarily leads to a 
better understanding of the underlying processes or that it constitutes 
the most fruitful approach to such an understanding In many instances, 
specific intuitions of potentially fundamental importance can probably be 
tested directly and efficiently m experiments that are hypothesis oriented 
The empirical development of the metric approach should, however, 
provide a rigorous language in which to develop intuitions, frame 
hypotheses, and test theories 
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Identification Learning 


Following the strategy of the older sciences, mathematical psychology 
has been primarily concerned with the simplest possible abstractions of 
the real world Mathematical learning theory, in particular, has con- 
centrated on a particular kind of simple learning — two choice behavior 
m a single repeated stimulus environment Except for Sec 5 of Chapter 
10, that chapter as well as Chapter 9 of this Handbook has been restricted 
to problems in this form of simple learning Psychophysics, on the other 
hand, has long been concerned with two or more stimulus presentations 
fay the experimenter, the questions being asked have no meaning when a 
subject believes that a single stimulus presentation is used on every trial 
of an experiment In attempts to unify learning theory and psychophysics, 
therefore, we must deal with experiments containing two or more 
presentations * 

Experimental studies of animal learning include a large number of 
experiments with two distinct stimulus presentations For some time the 
term "discrimination" has been applied to such studies Jn jumping stand 
experiments, for example, one presentation includes a black card on the 
left and a white card on the right, and the other presentation has the cards 
reversed Or in a V-maze expenment, a steady light and a flickering light 
are used to differentiate the two presentations Most informal theories 
of learning and a few published mathematical models have attempted to 
account for the results of such experiments 

The analogous human learning studies have used many stimulus presen- 
tations rather than just two The most common design is paired- 
associates learning With only two stimulus words, subjects learn too 
rapidly for the phenomenon to be interesting, and so twenty or so words 
are normally used in these experiments 

The psychophysical designs called "detection" and ‘Tccognilion" can 
be given the same abstract characterization as the animal ‘ discrimination" 
experiments, each such expenmeni uses two stimulus preseniations 
The so called k-alternative forced choice experimenis in psychophysics, 
like the paired associates studies, use many presentations rather than only 
two, but otherwise they are similar 

* The term stimulus presentation pertains to the complete configuration of stimuli 
presented to a subject on a single iriat and not to that portion which is sampled** or 
^perceived by the organism See Chapter 2 for a discussion of terminotogy 
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All these experimental designs, animal discrimination learning, paired- 
associates learning, detection, recognition, and forced-choice, have been 
placed into a single category called “complete identification” experiments 
(see Chapter 2) They are characterized by a one-to-one correspondence 
between the stimulus presentation set and the response set This chapter 
IS devoted to models for two alternative complete identification experi- 
ments, and so the prototypes are animal discrimination and human 
recognition designs The stimulus presentation set is denoted by 


and the response set by 


^ = {^1* To} 


The presentation probabilities are assumed constant, and so we let 


P = Prisi on trial n), 

\ — P = Pr{sQ on trial n) 

Two conditional response probabilities are needed 

= Pr(r^ on trial n | j, on trial n), 

?n - Pr{n on trial « 1 on trial n) 

Note that in general, and q„ do not add to unity 
The only experiments to be considered are those in which rj is always 
correct” when is presented, and is always “correct” when Jo is 
presented, partial reinforcement procedures will not be discussed The 
outcome set, then, has two elements 


0 = oo} 

It IS assumed that is preferred to o* and 
shown in the following display 


so the payoff matrix is as 



th ^^ 1 °^ factors beyond the species of subjects employed distinguish 
he animal discrimination and human recognition designs First, the 
. primary interest m the animal studies, whereas only 

H 'S examined m the human experiments 

m 'h' stimulus presentations are intended to 

me hi, "'k or •■confusable" for 

inate“^rr U-’ r " “'““"y 'he animals “discrim- 

pa ec y a ter learning is complete p„ tends to unity and 
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tends to 0 The human subjects, on the other hand, stabilize at response 
probabilities other than one and zero The animal data are usually 
plotted in the form of learning curves such as those sketched m Fig 1 
(All too often these curves are combined into proportion of “correct” 
responses versus trials, this loses information which may be useful in 
model testing ) The human recognition data are often plotted, os shown 
m Fig 2, to give an *'ROC curve” or “isosensitivity curve ” Each point 
on such a curve is obtained from the asymptotic response probabilities 
that result from a fixed set of experimental conditions Various points on 
the curve are obtained with fixed stimulus presentations and variable 
presentation probabilities, P and 1 — P, or various values of the outcomes 
in the payoff matrix (cf Chapter 3 of this Handbook) 

The traditional preoccupation with transient behavior m animals and 
asymptotic behavior in humans is probably an accident of history In 
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most recognition experiments the subjects are explicitly instructed about 
which response is correct for each stimulus presentation, and so learning 
occurs very rapidly This procedure, however, is not necessary, human 
learning can be examined easily if such instructions are omitted or 
modified Confusable stimuli are as easy to present as distinct ones in 
animal experiments, and asymptotic behavior can then be examined if 
enough trials are run The latter has seldom been done, perhaps because 
there is very little evidence that animals will, in fact, stabilize at response 
probabilities other than one and zero As experimentalists well know, 
animals readily develop position habits m discrimination learning experi- 
ments Often great care is taken to prevent them from developing by 
properly choosing the stimuli, modifying the early part of the presentation 
schedule, or using forced trials Sometimes animals are discarded from 
an experiment because they never “learned the discrimination ’’ It appears 
that discrimination, like woman, is a sometime thing This observation 
presents a challenge to mathematical models — to specify the conditions 
under which an animal will or will not learn to respond asymptotically 
without error The experimental facts seem to be these Sometimes 
perfect learning occurs and and sometimes position 

habits develop (p„ ->■ 0 and ot p„-*\ and I) Whether or 

not intermediate asymptotes ever occur is not clear from the animal 
literature The human data clearly show that intermediate asymptotes 
develop when the stimuli are confusable and that perfect identification 
Mcurs when the stimuli are sufficiently dilferent from one another 
Whether position habits” (one response always occurring) can be 
produced with humans has not been explored systematically 


1 dual-process models for 
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During the last fourteen years, several models for discrimination learn- 
ing have appeared in the literature None of these has turned out to be 
very satisfactory, either because it defied detailed mathematical analysis 
or because it led to a serious conceptual difficulty or both Nevertheless, 
L '“"'"'“'y some of these models seems appropriate here if for 

oL re!«Xs" ■" 

ih JJ'Vno'Mi",' '“--ning models have one thing in common 

coin F ' psychological processes The first is a 

Xn?."'"? Py Per simple learning 

When a single stimulus ,s repeatedly presented, stimulus elements become 
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“conditioned” to a “ slTr/pcrimenM^event! 

response probability to describe t u^mcm nr another the animal 

The other process is perceptual, by j. stimulus configura- 

teams to pay attention to the aPP™P™“ “ 1 of discrimina- 
tions that are thought to d.fferenMte them Th- IS 

tion learning has a good ‘"'“'elTry frml but that on some trials a 

placed m the startbox of a T-maz nther trials a green light is 

red light appears at the choice pom an , ^ l,pht is on and in 

there Let food be placed m the le ° The light is clearly the 

the right-hand box when the green g contains many other 

“discriminative cue,” but each st^ulus ttention^o the 

cues which appear on all trials “/“.believed Observed cues become 

light and to Ignore eveijthmg else response, depending" on 

conditioned or deconditioned to a pe „.n/nse Thus the cues 

whether reward or '^%„nd„lon'^d to the left turning 

associated with the red light o tight-turning 

response only, those ^ conditioned m part to both 

response only, and the common cu« am 

responses For perfect learning to occur, in 

must become ineffective are embodied m each of the 

puIl'ithrmtd^U^Zrer^^^^^ — ... general point and 
the difficulties to which it leads 


1 1 The Bush-Mosteller Model 

One of the first formal OTntmmn I is represented by a 

by Bush and Mostellet (1951) Pjion s, by a set So of elements 

se't Si of stimulus ele-nts jd^^^^^^ .^bs'et Xof S. U So, and a 
A measure function m(.X ) is deiinea u 

similarity index IS defined by ^ 

^ " ni(S.) 

t,« fl950), a subset C of Si is 

given ^ 1 , IS ^ m(C) 

’’ m(S.) 

The conditioned subset C is partitioned into 

/ <= Si r\ So and r «= -^0 
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_ m(7;) + m(I,) 

Letting ^ 

a = — «_ ■ mgj 
m(Si - S„) ’ m(Si n S„) ’ 

the probability of response r^, given becomes 

p = a(l — 1/) + fit; 

The interpretabon of this last equation is clear ot is the proportion of 
me^ure on Si — S^ that is conditioned to r,, ^ is the proportion on 
^1 n that IS conditioned to r„ and i, is the relative measure of the 
intersection 

presented, elements of 5, — Sq become conditioned to rj 
to when that response occurs Linear 
means thaTnif the ratio a tends to 1, which simply 

A Similar a ^ of Si — S^ are asymptotically conditioned to ri 

rie^ ' n rhrr of -5, However, the 

to r on t to i*" ofo conditioned to r, or deconditioned 

r on r tlls I * conditioned to r. or deconditioned to 

.Ln /“aldtea&rw^' tht’s^^th:;" ' ^ "" 

T» (f - >7) + 

conJmomnTnrof “Ti*"' I" '^^1 o?“obon is a simple 

Therlre a^se'LonTo P=tfccUearning (p„-> 1) 

the intersection S ‘1 *® “®®“™ed that the measure of 

toward 0 as lcarnmgpr^gres"es TL‘t'iT'Ii“r 

that an animal pays leL afiH Ipcc « ! to capture the notion 

intersection A "discriminnt ““ontion to the stimulus elements in the 
discrimination operator,” D, is defined by 

Dn = kr, (0 < It < 1) 

should thcTMrator^he''-i' 1^™^“ ’■’= oritical question When 

naive answer that £1 should*be “'I ^°stcllcr proposed the 

rollowsanr.presenfat,oro ’> Hi presentation 

to the olhe? Z d ° f™" °"= ‘yP= “f ‘™1 

stimulus elements, regardless o'fthe'i'^ ™portance of the common 
trials For this notion to h u*™' between those adjacent 

notion to be at all sensible, we must assume that the 
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animal (1) is “aware” of the shift and so “knows” how to distinguish 
from ^0 and (2) can “identify” the elements in the intersection in order to 
know” what to pay less attention to in the future Why should such an 
intelligent organism let the common elements influence him at all once he 
has seen both Sj and Jojust once*^ An elaborate sampling scheme might 
weaken the force of this cntrcism, but there are other problems 
Consider the limiting case when and arc identical, that is, t} — \ 
The experiment reduces to one of simple learning with partial reinforce- 
nient, it is well known that learning does occur m such experiments 
However, if the discrimination operator D is applied as suggested, the 
measures of Si and S(, must both decrease to 0 and asymptotically the 
animal pays attention to nothing Bush and Mosteller attempted to 
escape from this paradox by assuming that the operator J) is an identity 
operator when Si = Sq, thus D depends on the initial value of r] 

In addition to the conceptual difficulties just cited, there are mathema- 
tical problems also The model leads to a complex stochastic process 
whose properties are virtually unknown Even if we make the simplifying 
assumption that reward of one response and nonreward of the other 
response have effects of the same magnitude, there is trouble In this 
expenmentec-controlled case, we have 


««+i 

and 

^n+l 

These lead to 

/"n+l 


-h 0(1 — a„) 

a„ 

p, + 0(1 - 

lP„ + e(i -pj 


if Si on trial n, 
if Jo on trial n, 

if Si on trial n, 
if Jq on trial n 

if Si on trial n, 
if Jo on tnal n 


From this transition rule, it can be seen that even if ij„ were constant, the 
process is not path independent in the p„’s, because p„+i depends on 
as well as on p„ If this is not devastating enough, note that r)„+i depends 
not only on 7 ]„ but also on the stimulus presentation that occurred on trial 
w — 1 Such a path-dependent stochastic process has not been subjected 
to mathematical analysis, presumably because it is very difficnlt 


1 2 The Restle Model 

A similar, but somewhat different, model was described by Restle 
(1955) Each stimulus clement or “cue,” as Resile called them, is either a 
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relevant cue or an “irrelevant” cue In the notation of Sec 11, we can 
think of irrelevant cues as located in the intersection n and of 
relevant cues as in the remainder of <J S„ Restle, however, did not 
distinguish between the relevant cues in S, and those in Similarly, he 
did not distinguish between a correct response, given s^, and a correct 
response given It is clear from Restle’s later publications that his 
notion o^^ a cue is somewhat different from Estes’ notion of a “stimulus 
e which was used m the Bush-Mosteller model previously de- 

scri e ^ Restle (1961) replaced the word “cue” with the term 

situational variable” whose values are called “aspects ” Examples are 
CO or, position, shape, etc Thus situational variables present on trials 
are a so present on Sq trials, and vice versa, although the aspects are 
different on the two kinds of trials Therefore it seems more consistent 
h the spirit of Restle’s theorizing to think of a single set of cues, or 

and^OTelcTOnTcues‘*“‘ Partitioned into relevant 

diferent •"odcl IS a concept that is somewhat 
c^di led ?,, models Restle says that “a 

reward ” ThlsV^°? "'hich the subject knows how to use in getting 
connection ^ “S"'tive notion than the idea of an S-R 

cue as beinc co'd*?^ '•'“ry, Restle does not speak of a 

common '’“"S 

Zifc ??d„ ’ '^'"8 ““'■“I ■" making The 

h“L rcltn “tc probability c(n) 

c(n + 1) = c(«) + e[i _ <•(„)], 

where 0 < 1 .s a rate parameter Thus, ifcd) = 0, it follows that 
c(n + I) = 1 - (1 _ o)F 

° The tcon?™" -"'‘■•mned in this model 

(called “suppre?i?n?n"R?tle’s lal'^ "^‘'“P'^'mn" of irrelevant cues 
adapted, or suppressed ,t nr, lo '"""*‘"£5) Once an irrelevant cue is 
oCn) IS the probabilitv lint influence on behavior If 

start of tnal n, Restle’s percep^'a^olU 

“(n + 1) = a(n) + 0(1 - „(„)] 

axiom and m thc'^rrept^al????'^ f'* m **'= conditioning 

that cptual axiom From the latter u follows, if o{l) = 0, 

o(n + 1) = I _ (I _ ojn 



DUAL-PROCESS MODELS FOR 


IDENTIFICATION LEARNING 


The third major axiom relates the probability p(n) of a correct response 
on trial n to the probabilities of conditioning and 
assumed that unconditioned relevant cues and unsuppresse 
cues lead to the correct and incorrect response with equal probability If r 
IS the number of relevant cues and , is the number of irrelevant ones, then 
the axiom is 

cfnl + }r[l — cfnH + tdl - °t")l 
“ r + l[l — o(n)l 

The denominator is the expected total ^ cu^plut 

the numerator is the expected number “f X,oned 

one-half the expected number of cues that have been 

"'’SSlxiom. introduced for reasons of simpliflcation without a 
clear intuitive justification, is 


6 = - 


The four axioms together yield, after algebraic manipulations, the flnal 
result, for c(l) = a(l) = 0 


1 r JL:Li)rLl 
p(n) = i--Le-Ki_e)"J 


This result is very simple “ "“J’sThose parameters /w are 

that consists of a sequence of Bemo Q No one can corn- 

related by a simple function with a singe p ojjtamed, however, by 

plain about the complexity __t]ons Why, for example, 

making a number of strong simp rate be equal and why 

should the conditioning rate and PP cues? However, even 

should they both equal the proportion ® obtain an explicit 

if these specific assumptions ammeters now instead of only 

formula forpfn) in terms of n unsightly, the fact remains 

one Although the equation wou ^ q-jiis is a 

that the model predicts a unique | the probability p(n) does 

consequence of the basic structure o occurred on tnal n — 1 uor 

not depend on which stimulus powerful assumption 

on which response was made t-sts have yet been made 

testable directly, but no such uxpen patently false Consider, or 

the face of it, it seems „c,dules especially some extreme 

example, a variety of seouenc=Xx,Xi.x. 

ones First let P = i and consider * q J„ng one, » 

and fe, X., x„ Xo, ... X.} Wl second displayed 

mcrea e m p(n) is not too unraasonable. 
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relevant” cue or an “irrelevant” cue In the notation of Sec 1 1, we can 
think of irrelevant cues as located in the intersection D and of 
relevant cues as in the remainder of U Restle, however, did not 
distinguish between the relevant cues in S, and those in S„ Similarly, he 
1 not distinguish between a correct response, given and a correct 
response given s„ It is clear from Restle’s later publications that his 
no ion o a cue is somewhat different from Estes’ notion of a “stimulus 
e w ich was used m the Bush-Mosteller model previously de- 

scribed In his book, Restle (1961) replaced the word “cue” with the term 
SI ua lona variable whose values are called “aspects.” Examples are 
color, position, shape, etc Thus situational variables present on Si trials 
e a so present on Sq trials, and vice versa, although the aspects are 
different on the two kinds of trials Therefore it seems more consistent 
asnecK^ ° ® theorizing to think of a single set of cues, or 

and irrdrvant”cu«“""‘ Pttttitioned into relevant 

diffbre°nt froi'(h''l''f 'd Restle’s model is a concept that is somewhat 
c»di led ”, Restle says that “a 

reward ” This* which the subject knows how to use m getting 
roreetion ,h ‘i^ " “8"'"'’' the idea of an S-R 

curasTeml clnV,”'’ “f “ 

conditioned^tha, * * particular response but merely as being 

specirefnd,?^ ■" ^ of •’eing useful in decision making The 
hmL reltr”® ‘he probability eW 

Re^ssuts^Lr '’y of ” 

c(n + 1) = c(n) + 0[1 — c(n)], 

where 6 < 1 .s a rate parameter Thus, if c(l) = 0, it follows that 
c(n + 1) = 1 _ (I _ 6). 

°TLte™u"d nro?"°”' r'’’"'’""* "’“■‘o' 

(called “suppression’^i^R^stle’r? T of irrelevant cues 

adapted, or suppressed it nn I J '"*hngs) Once an irrelevant cue is 
airi) IS the probabihtv tint “oy influence on behavior If 

ttart of trial „, Restle', percepm"tl“” iT' '’y ”'0 

o(n + 1) = „(„) ^ 

axiom and in the'wrcepti'au”'*” tnvolvcd both in the conditioning 
that "'"'P'tceptual axiom From the latter it follows, if u(l) = 0, 

°(" + 1 ) = 1 — (1 — 0 )". 



DUAL-PROCESS MODELS FOR IDENTIFICATION LEARNINO '7' 

The third major axiom relates the probability 
on trial n to the probabilities of conditioning and ^ 

assumed that unconditioned relevant cues and 

cues lead to the correct and incorrect response with equal P otabdity I f 
IS the number of relevant cues and i is the number of irrelevant ones, 

the axiom is ^ , 

c(n) + ^r[l - efnll + ii[l - ”(”)] 

P(") == r -j- ,[1 - <i(ii)] 

The denominator is the expected total number of “"®“PP''“®^'J plus 
the numerator is the expected number of con i lo conditioned 

one-balf the expected number of cues that have been neither 

“SStxiom, introduced for reasons of simplification without a 
clear intuitive justification, is 

The four axioms together yield, after algebraic manipulations, the final 

result, for c(l) » fl(l) * 0 i i 

1 r (1 , dr±l 
?(")=! + 

I Tt imnlies a nonbranching stochastic process 
This result is very simple “ ' ^^,1, ,^3,5 „hose parameters /i(n) are 
that consists of a sequence ® parameter 6 No one can corn- 

related by a simple function vw H ,5 obtained, however, by 

plain about the “mpl^'V ^ng assumptions Why, for cmu'PJ'- 

making a number of strong P ® ^jss.on rate be equal and why 
should the conditioning rale aiiu ,„an, cuss’’ However, even 

should they both equal the prop still obtain an explicit 

if these specific assumption parameters now instead of only 

formula for p(n) m terms o ^ unsightly, the fact remains 

one Although the equa 10^^^ “ 

that the model predicts ^ model, the probability p(n) docs 

consequence of the presentation occurred on tnal n - 1 nm 

not depend on which sti Pguch a powerful assumption should 

on which response was m ,csts have yet been made On 

testable directly, but ^blc if not patently false Consider, for 

the face of it, it seems schedules, especially some extreme 

example, a variety o consider the sequences {xp J„, Jp Xo. • . 
ones First let F = J s XVith the alternating one, a monotoi^ 

and {xp Xp - - ■ . I'D Vnrcasonable, but with the second displayed 

increase m /K”) ** 
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sequence, it seems highly likely that />(«) will increase monotomcally 
during the presentations and then undergo a large decrease on the first 
Jo trial Second, let P be very large or very small Should this lead to the 
same sequence {/j(n)} as schedules with P ~ The difficulty is abun- 
dantly clear response probabilities cannot be independent of the 
presentation schedules 


1 3 The Wyckoff Model 

Wyckoff (1952) was much influenced by the thinking of K W Spence 
and of C J Burke on the problem of discnmination learning “Attending 
responses, or onenting responses,” or “observing responses” play a 
central role in their theories, Wyckoff built this concept into a model, or 
at least the formal structure of a model He did essentially no mathematical 
analysis, however 

On each tnal of an experiment, an animal is presumed first to make an 
a ing or observing response An overt motor response follows and is 
recor e As Wyckoff suggested, if the discriminative stimulus is placed 
on the roof of the start alley, then the response of hfting the head is 
necessaty for the animal to observe that stimulus The recorded response 
IS then, for example, a left or right turn in a T-maze The abstract process 
fit L ^ begins in state jj or So, depending on which 

>s presented by the expenmenter Then with some probability u 
s a en mg response or a®, or, with probability 1 — k, does not 
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make sueh a response Depending on wh.eh of the three states the rat is 
then m, it makes response n with probability ir, y. or z or “ 

with the complementary probabihty We thus have four basic conditiona 

probabilities , .is 

u — Pr(ai I J|) = 1 ^o)> 

X = P/-(r, I fli), 
y = Pr(ri \ flo), 
z = Pr(ri 1 a) 

It follows at once that 

p = Pr(n 1 sO = «^ + 

g = i>r(ri I ro) = «» + (!- 

Presumably, ah the conditional probabihties change 

reward always follows r. and never foBowsro when r.i p 


Table 1 



n.pd then X should increase whenever 
reverse occurs when Sg is , change otherwise, and y should 

attending response fli « ^ change otherwise But s should either 

decrease when tXo is made an . depending on whether Si or Sg^s 
increase or decrease when a changes m the attending 

presented The remaining qu ^yckoff argues that u should increase 

probability u Following Si*""- J,,, by reward later in the chain and 
when an attending «5pc"« ? „ ,b<,u,d decrease when a is rewarded 

decrease when it is not « The signs of the increments m the 

and increase when a » no rewu „„„ „„ 

four conditional probabilities I „ 
Shown in Table 1 learning is possible in this model From 

We first ask ether positive or zero and that 

Table 1. we see th^^^.„b reasonable transition rules, we expect th 
negative or zero 
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a: -I- 1 and ^ 0 as learning progresses provided that P 0 and « 0. 

If this IS true, then 

P -► K + (1 — U)Z, 

q^(l-u)z, 

and we see thatp — >■ 1 and q~^0 only if « — ► 1, no matter what value z has 
It may be helpful to note that 

p-q = u(x- y), 

and that when a; ^ 1 and y -► 0, then 


p — q^u 

For perfect learning, p _ ^ which again requires that u -<■ 1 From 
a e 1 we see that Au is either positive or negative, depending on the 
evTOt, and so it is not clear whether it is possible for a -<• 1 
To investigate further the asymptotic properties of the Wyckoff class 
of models, we must assume specific transition laws Consider linear 
operators such that whenever some probability iv, increases. 


and when it decreases. 


Wn+i = w„ + fl(l - w„). 


•*’■+1 = w „ - etv „ 

Like Restle, we assume that the same parameter 9 is involved in all the 
' “ To simplify things further, consider only equal 

form P'^°>>abilities, p = i _ p = jp jhe model then takes the 




J/n+l = 


where 


+ 6(1 -xj 

With prob uj2. 


with prob 1 — uj2. 

- OPn 

with prob uj2, 


With prob 1 — uJ2, 

+ «(I - 

with prob 

- 6u. 

With prob I — 

= 1[1 + K, 

.(^n - y„)] 


0, It seems evident that | and y, ^ 0, although no 


Unless w, 

ngorous proof or. h,s'i;-7n;™ ‘ no;\?C= ralV=b,in 

= 1(1 + M„), 
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1 I »„) = (1 - + I (1 + 


= Ii„ + - (I — «n) 

Under these conditions, the conditional expectation increases on each trial 

and so it is plausible that -► 1 «„ct.tiitp a nroof 

The plaus^ihty arguments just presented in 
but they are consistent with WyckolT s conjecture ^ as*“^ P 
does leL to probability values of 1 for correct 

One would hke to prove that this conjecture is true under more general 

conditions, but the mathematics 'S , , complex, 

The stochastic process implied bV Wyckoff s model is ve^^^c^^ _p^^^ 

even when linear operators and equM ^ by writing 

occurs, then 

Pfi4.1 + (1 “ w„+i)2»+i 

= [«„ + 6(1 - + »(' - + '■ " 

Simplifications yield 

AO but also on i/„ and i. Similar 
and so p„+i °"!Uother's" ven events Detailed mathematical 

conclusions are reached for th comple^^dy are certain to be most 

analyses of stochastic processes o! m 
difficult, if not impossible 


1 4 Stimulus-Sampling Appro^^*’ 

j j «t rxhaust the published literature 
The three models just learning Another major class 

on dual process models sampling theory One begins with a 

of such models arose out o ,bat arc sampled by a learning 

finite population of s'™" , j set of axioms and become condilioned 

organiL according to a er set of axioms Finite state Markov 

to his responses according ! „ ptesenis an 

chains usually result CbaP>«^^,y .n general, and See 5 of that chapter 

account of stimulus ^ P '‘discnmination learning 
,s devoted 10 .tsappl>«t>°" 
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2 SINGLE-PROCESS MODELS 


In the preceding section, three examples of dual-process models for 
identification learning were presented The Restle model led to serious 
conceptual problems, the Wyckoff model led to major mathematical 
difficulties, and the Bush-Mosteller model exhibited some of each It 
seems fair to conclude that the complications arise out of the attempt to 
include both conditioning and perceptual mechanisms m a theory having 
continuous operators One direction of simplification, briefly mentioned 
in Sec 1 4, is to replace continuous operators with discrete ones, thereby 
obtaining Markov models Another direction is to eliminate the percep- 
tual mechanism, regardless of its intuitive appeal. If “discrimination 
earning, as well as learning in recognition experiments, can be described 
y models that contain only conditioning mechanisms, then it would be 
ar to justify the complications that arise from the inclusion of a second 
process This possibility has not been explored systematically 
Unlike the previous models, no intermediate hypothetical states are 
assumed, no use is made of stimulus elements or of observing states 
the process is completely described by two sequences of random variables 


Z, = 


0 


if s, IS presented on trial n, 
if s„ IS presented on trial n, 
eosponse Ti IS made on trial n, 

lo if response r„ is made on trial n 
Then, our previously defined probabilities are 


P = Pr(Z, = I). 

p„ = Pr(X, = I I = 1)^ 

?„ = Prf.X„ = I I = 0) 

Tu^SKaN'row ^ b"' ‘he models 

from IZ l 

introduce Dlau«,hls. ,1 ♦ stating a general axiom and then 


2 1 Path Independence 

manaceabk learning models has shown that they arc rarely 

manageable unless an mdepcndencc-of-path assumption is made So this 
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F.g 4 Probability tramfonnauoni m the luioar modal 

assumption is made here Specifically. - ol:^Vetl^e;'^^^^^ 

occurred on tnal n Formally we have the 
Basic Axiom 

wherefandf o" conmuous mono, one. ncnasmsfmctwns off. andq,. 

respectwely m detail in later sections, are 

Two examples, which wdl ^ pl „odel and the 

shown in Figs 4 and 5 The ftrst is 



FJk 


ijaftifonnauonj jn 


5 rroliat”'^ fr 


jh« l>ru model 
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second is the beta model (see Chapter 9) For convenience, we use the 
operator notation, 

P»+l ~ Tl)Pnt 
9ii+l = T'ifin* 

where i = 1, 0 if = 1 , 0 , respectively, and y = 1, 0 if — 1, 0, 
respectively Thus the first subscript denotes the presentation and the 
second subscript, the response on trial n In most cases of interest, the 
limits 

^ii = lim 


exist and have values of 1 or 0 (Identity operators sometimes arise) 
The models to be considered are more than tnvial only because we 
a ow/j„, the probability of given Si, to change not only on jj trials but 
a so on ^0 trials, similarly, m general changes on both Jj and Sq trials 
At hrst glance, this may seem unreasonable, and on second thought, it 
UIS separate processes are being introduced in spite of 

fii y ° ®^t, as will be seen shortly, if one chooses to call 

thern different pr<xesses, at least they are very similar 

of ^ gjven during a trial on which j,' is 
^ conditioning” when i' * i and “generalized 

loning when i ^ i These two kinds of operators are listed below 


Presentation 


Si 

So 

Sn 


The conditions 
operators 


Response 

Direct 

Conditioning 

Operator 

Generalized 

Conditioning 

Operator 

ri 


n 

ro 


rfo 

ri 

TS, 


fo 


Too 


presented next place restrictions on this set of eight 


2 2 Outcome Preference 


response evenlsTr ™nd •*"* chapter that the presentatioi 

to the other onten atways lead to an outcome Oi that is preferre 

other outcome n. tvh.ch always follows the events s.r, and V 
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Thus we speak of r, as “correct when is ’ C 

accordance with the law of effect, we assume that whenever ‘^e preferred 
outcome o, occurs, it produces an increase m the P“ ^ ^ 

response that just preceded it This assumption apphes both to the direct 
conditioning effects and to the generalized conditioning effects 
It IS assumed that the nonpreferred outcome o. debases probability 
of the response that just preceded it This argument yi 
Condition 1 For all p and g. 


TiiP'SiP, 
TioP^P, 
< ?. 
< 9, 


Ti\g 2; 9- 
n*o9 S; 9. 
TeU><p, 
TfioP 


1’ — - 

Thedirect-conditionmgp^m^^ 

It stems directly from reinforcement th ry it cavs that if a response 

part, however. IS a more substantive ™ J^^^ed m some 
IS reinforced in one stimulus as the two situations 

other situation (or at least not deCTease ) plausible and is 

are “similar” in some respects. “ Si.on 

consistent with the known facts abou 


2 3 Generalization Decrements 


L »™l.Ted effects of reinforcement should 
It was just argued that the f ,'7^onreinrorcement should be 

be positive and that the genera generalized effects are smaller 

negative In addition, we expec direct conditioning effects In all 

m magnitude than the decrements arc observed Thus we 

generalization studies, generalization decrera 

have 

Condition 2 For all u, 

ITioU — “IS l^*““ ~ 
ir„*iB - “IS iin»“ — “!■ 
ir> - “IS yPoM - “I 

,1, Ihe orcvious one about outcome preferences. 
This condition, “'°"® conclusion ,, 

leads to a small 1 and 2 are satisfied, then for all “ and/or 

Lemma I If Conditio 

/,y=1.0. r„“2:T,,“ 
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PROOF From the first two inequalities of Condition 2 and the first four 
inequalities of Condition 1 it follows immediately that T^u ^ TiiU and 
^loW ^ 7’io« According to the last four inequalities of Condition 1, the 
quantities within the absolute value signs in the last two inequahties of 
Condition 2 are negative Thus, when the absolute value signs are 
removed, the inequality signs are reversed, giving T*yU ^ T^iU and 
loou ^ r„„ii Q E D 

In Sec 5, this lemma leads to an important theorem about the 
asymptotic behavior of a version of the beta model 


2 4 Outcome Consistency 


So far nothing has been said about the relative effects of the two 
outcomes, reward and nonreward (oj and o„, respectively) We do not 
want a general condition that requires one of them to be more effective 
an e 0 er, because there is convincing experimental evidence that 
either outcome can be mote effective, depending on the type of experi- 
ment or particular experimental conditions For example, without 
ssuming a particular model, Sternberg has shown (see Sec 6 7 of Chapter 
blit fboT°' effective than escape in a shuttle box experiment, 

cxoerimi.nf^n'^.c nonreward in a T-maze reversal 

alfeniimm- ?'r a particular experiment and with 

mnrp^ff " ^ held fixed, we expect reward to be uniformly 

reward^rmrr m r«=^se In other words, if direct 

uresenm„rib T' "onreward with one of the stimulus 

wTthTX; ^ cnn'^arning the direct effects 

Condition 3 For all p, g, p-, g'^ejher ™bodied in 

00 3-..P s r„p, r„g ^ r.v r^p- ^ ^ 

LeZt^' /f ^e obtain 

folloiungisirueforallu’ ’ ^ 'I'm one and only one of the 


(u) 

W 

(c) 

W 


7-naStr„a^771a^r.>^„. 

^ ^ 2-„ii ^ 77ia ^ „, 

^ TuU ^ 27^ ^ T,\u ^ u 
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Also, one and only one of the following is true 
la’) T„‘„h ^ Too" ^ J?!" ^ ^ 

(i') r„> < Tm" ^ ^ 

(c') ro*i« ^ 01 “ ^ ^ ^ 

(rfO Jo> < ToV < TmU < Taft < « 

Furthermore, ,f and only ,f(a) or (b) ts true, then (o') or (b') ts true, ,f 

only. nc) or Wt^ true 

PROOF. Statements (a), (o), (a). f-ii^w from 

part (i) of Condition 3 Statements (c), (d), (c ), and ( ) ^ ^ 

Lemma 1 and part (ii) of Condition 3 u i ♦ aT,» «till nnen and 

4re!rtrer2“sw^^ 

r;^r;at“Xu4 "o d^r:; P— r e.per.me„.a. 

designs . race for which reward is more 

Part (0 of Condition 3 reF^ent J" ^A^cted m orderings 

effective than nonreward and Fese ^ ‘ ,5 described by 00 

(a), (i), (o'), and (i ) of l-‘tomo 2 , Another distinction 

of Condition 3 and orderings W> W’ ^ cuuallv simple interpretation 
among the possible ,^ 3 , „e„„lhzeLeward is more effective 

ordering (a), for example, implies 8 imDiics the converse In this 
than direct nonreward, whereas or ^an^fc'^'iur when the general- 
way, we see that ordenngs {a), ( J, ^ exist when the 

ization IS large and ^ ^"ferirctaUo^^ lead to the following table. 

generalization IS small Th of Generalization 

L^rge Small 

Reward more effective r'’. “ *•* ! 

Nonreward more and whether the gcncral- 

Whether reward or ._ocar to be parametnc questions, and 

ization effects are large or sni JJJjois further without recourse to facts 
sowcdonotwanttorcstric oiay be appropriate for some 

One additional "jj^ribcd next 

experiments and so it 

2 5 Symmetry 

. j «n< fxjsscss an intnnstc sjmmetr) In a T-maze 

Manyexpenmentaldesi F ojumpir, ihe left-hand and the 

discnm.nation-leaming appara 
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right-hand alleys are of the same length and have similar goal boxes 
The stimulus presentations are usually selected such that they are initially 
“neutral” to the animals And the payoff matrix is symmetric by as- 
sumption Although response biases may exist — the rat may be left- 
handed, for example — it seems more natural to describe these by initial 
response probabilities different from \ rather than by conditions on the 
event operators 

If the experimenter’s attempts to produce a symmetric two-choice 
situation are successful, this should permit us to simplify a model Reward 
of response rj with presentation Si should have the same effect on the 
strength of as reward of r® with Sq has on the strength of tg The same 
remark should apply to nonreward, and both remarks should apply to the 
generalized effects as well as to the direct effects Another way of putting 
It is that the model should be invariant to a simultaneous relabeling of the 
presentations and responses If and Sq are interchanged, rj and ro 
interchanged, and all probabilities replaced by their respective comple* 
ments, then no changes in predications should result The following 
condition formulates this requirement 
Condition 4 for / = i, 0,/or y « 1, 0, and for all u, 

- u) 

One example of this condition may help to clarify its meaning Suppose 
that at some point in learning we have ^ = 1 — ^ Now if Jq is presented 
and ro occurs, q decreases to On the other hand, if is presented and 
ri occurs, p increases to Condition 4 says that - 1 - T^p m 

this example 

Condition 4 reduces the ^stcm of eight operators to only four operators 
and this is a major simplification The model builder would want strong 
evidence against this condition before he would be willing to give it up» 
four operators each with one or more parameters is quite enough to 
handle! This symmetry condition simplifies the statements in Lemma 2 
We have 

Lemma 3 If Condition 4 u met, orderings (a), (b), (c), and {d) of Lemma 

2 imply and are implied by orderings (a'), (6'), (e'), and (d ), respectaely, 

of that lemma 

PROOF If Condition 4 is imposed on each of the quantities in ordering 

(a V we nhtntn ^ 


> - ^'iif ^ 1 - r* n ^ 1 - r„i. ^ 1 - r.*„i> 


where p « 1 — « This leads at 
shown in the same manner 


once to ordering (a) 


The other cases are 
QED 
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We may not always wish to impose Condition 4 on a 

we do the number of parameters IS cut in half Under sue a 

It IS often convenient to deal with the four operators, u, 
all of which always increase their operands except when they 
In principle, it is possible to analyze data to see if on i tjchnical 

mately sLsfied, L undoubtedly this would encounter senous technical 

difficulties 


2 6 Equal Reward and Nonreward Efifects 

Detailed mathematical analyses often 
Models with experimenter controlled events a p .yorthwhile even 
than other learning models, and so their inves ig Bush 

though they seem to be inapptopnate tor ““y P ^ 5 Handbook, 

& Mosteller, 1955, Chapter 3, or Sternberg, Chap m 9 ^ 
for a definition and analysis of experiraen er ,mnhed by our basic 

simple learning ) The class of single process models implied by 

axiom has the transition laws 


Pn+i “■ 


^n+l "" 


. of the response random 

To make the stochastic process m epe 
vanablc X„, we musumposc 

J-. „ = r,*o>< ■= 

r;,u - Tot" ■= To" 

„o. only reduces. he syslem of cghlopenuors to four 



Condition "s’ Tor all ", 


This condition 
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operators (as did Condition 4), but it simplifies the transition laws to 


Pn+l — 

[T’l/’. 

if z. = 1 


ifZ„ = 0 


(rtv. 

ifz„ = 1. 

9«+l = 

In?. 

ifZ„ = 0 


Because Z„ is a Bernoulli random variable with fixed parameter jP, the four 
operators are applied with constant probabilities. As a result, one can 
anticipate clean mathematical results when Condition 5 is imposed They 
may not always be realistic, however 


3 NONGENERALIZATION SINGLE-PROCESS 
MODELS 


A possible class of identification learning models arises from the pre- 
ceding framework by assuming that no generabzation occurs from to 
Sf, or from Jq to This is equivalent to assuming that the operators Ti*, 
^*o» ^on ^nd Too are identity operators Condition 2 is thereby satisfied 
By imposing Conditions 1 and 3, L«mma 2 holds and it reduces to 


Tuu k Tiow ^ u and ^ Ttu ^ u, 

or 

^low ^ ^ u and T* u ^ ^ u 

The first possibility corresponds to reward being more effective than non- 
re^rd, whereas the second possibility corresponds to the converse 
We see at once that either increases or remains constant on every 
trial and that either decreases or remains constant This leads at once 
to 

Theorem 1 If for all « < 1, and for all « > 0, 

Jm" < U, < U, thrnasthenumberofs.prescntauonsandthenumber 

Of Sn presentations become large, p^~*. l andq > p 

sislnt occurs and so it is con- 

however tu animal expenments It is inconsistent, 

fromZ e. '.P°"'r prediction also follows 

X tnah nnT.T'’ ’’Z'’ ^''''"‘'■“•■on We si Lip, changes only on 
accZ Z ?’ only on s, tnals If we partition the trials 

ccording to which stimulus is presented, if we let p* be the probability of 
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r, on tho fcth presentation of and .f we let ?, be the probability of r, on 
the /th presentation of Jo» then we have , j , 

Theorem 2 ne (no sequences {/),} and {?.} are mdependen 
This result says that the order of the j. and 
It IS as If we wL running two completely separate 
any interaction This prediction, as unlikely as it f 
tested, and so no data are available to condemn « ^n obviou expe 
mental design is to run one group of rats on a 
and another group on a schedule in which a J* . , r. 1 

prior to all the /presentations The theorem P^f ^ 

{?,} sequences will be statistically identical for the two gr p 
finds this to be true, he will make a major discovery 


has 


4 SINGLE-PROCESS LINEAR MODELS 

Up to this point, the functional fo.™ <’f been con- 
not been specified, only general “’"^“°'’* . , _jj,^ental data have 

sidered As a result, few testable prediction f one, two special 

been possible Therefore in this section and the following one, tw p 

types of learning operators n™ 

The most extensively studied learning op are linear provided 

probabihties The operators T., introduced in Sec 2 1 are u 

= (1 - e.,)« + 

1^1 H 0 < 6 ^ 1 Conditions 1, 2. and 3 of Sec 2 are 
where 0 ^ I and 0 ^ Condition 1 requires that 

now imposed on these operat 

2„ = A.„ = Jfi = ^*0 = 

= 2,0 = 2;, = % - ■ 

and so the eight operators are of the form 

= /> + 

Tt,P=P + Pi»(’ 

T^^-p-0,Ji, 

r„p=A- ®«P’ 

y = q dr dfif^ " 9 )' 

J7o? = ? + ~ 

To'i? = 9 " ®“i’’ 

2w? = 9 " ®w>9 
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Condition 2, which implies Lemma 1, then requires that 

011 ^ Ofi. 0 ,, ^ 0*0. 

001 0O1» 000 ^ 000 

Condition 3 requires either 

(0 

M 011 

Restrictions (i) correspond to reward being more effective than nonreward, 
whereas restrictions (n) correspond to the converse 
Even with the restrictions imposed by Conditions 1, 2, and 3, the model 
contains eight parameters, a few loo many for detailed data analysis 
therefore additional restrictions are desirable, two sets of which ate 
considered 


4 1 Symmetric Operators 

Y-maze experiments, it seems natural to impose 

this no a". condition For our linear operators, 

this eondition is equivalent to 


«f, = 1 

ej. = 1 


dfo = I 

= t 


behavmr^nf ^main and so we now examine the asymptotic 

expectations ^ given values of p and y„, the conditional 

expectations on the next trial are ’ 

^(Ps+i = p „ + Z„|p„ 0„ + (1 _ _ p„) 

1 Fs. ?»)=?„+ z„[p„e„ + (1 - Fs)0oiI(I - ?„) 

- (• - Z0[?.e„ + (1 - y„)S„]y„ 

general, brcaTc''ir7"= “ST ''‘“n "ot possible in 

I'n 1 «ina s= 

«Fs+i I Fs = 1 . y„ = 0) = 1 _ _ zjo„, 

I Fs = 1, y, = 0) = zj^ 

These last equations say that tf « — i j t ,1 

= Ooo when Z, = f or o ^ - 1 and = 0, then = 1 and 
Pn+i — 1 — and q^^^ = 0 when Z„ »= 0 
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In either case, the state /i = 1 and ? = 0 is not maintained On the other 
hand, if a new restriction, Sm = 0, is introduced, it appears that the state 
of perfect learning can be maintained if it is ever reached These i eas are 
now made precise 

Theorem 3 If dn, dm, d„i > 0, n siable asymptouc dismbmwn exists, 

bulp„-- land q„-<-0 with probability Ilf and only f boo- 0 

PROOF Using the method of Lamperti and Suppes (1959), Davi 
Krantz has shown that all the moments of the distribution of (p,0 
converge if di„ d„i > 0 Thus a stable asymptotic distribution exists As 
a result, we can write 

= hm E(pJ = hm E(p„^.l), 

E<.qJ = hm E(q„) = hm £( 9 „+i). 


ElpJ) - hm £(pA 

«-»« 

E(qJ) = hm £(«„'), 

E(p^q^) = \m^E(p,q„) 


By taking the expectations over the (p.q) distribution of £(p„+. | 
and £(?„+! \p„ qfl and then taking limits, we obtain 

0 = Z„{d., + (d„ - 2d„)E(p„) + (Oio - bnlElpJ)} 

_ (1 _ Z,){0kE(.pJ + (»oi - 

0 = z„{o„ + (d„ - d,.)E(P.) - OoiEiqJ + 

- (1 - Z.){d„E(<I„) + (®n> - bulblq.’)) 

„ „a,o,ndent ofp, and ?.) If the asymptotic 
(It IS assumed that Z« is ind pe ./i at then £(/>□-) — £(Pm*) — * 

distribution bas all its density at 1^^ These s’alues satisfy the preceding 
and £(ya,) = E(q,xf) — _ n The uniqueness of ihe asymptotic 

two equations if ^ "“d"by Krantz’s proof Q E.D. 

distnbution is already “ learning occurs asymptolically only if the 

Theorem 3 states that pe The effects of nonreward may 

effects of reword ^ he model reduces 10 a linear session 

generalize, ho"''"’ '' ' „^^els discussed in Sec 3 The psyeholopcal 
of the d‘>"E'""?''^-,minc process that permits nonreward 10 generalize 
interpretation of a another but does not permit reward to 

from one stimulus pmw 

generalize 15 noiesident 
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4 2 Experimenter-Controlled Operators 

Condition 5 of Sec 2 6 sets the reward and nonreward effects equal, 
and as a result the models become experimenter-controlled For the linear 
models considered here, the restrictions on the parameters are 

"= ®10 = Ol, Ofll = ^00 — ®o, 

sfi = flfo s e;, 0 *, = os, ^ os. 

and the transition equations become 

[f. + flill -/>„) ifZ„=l, 

l/l.-Vn lfZ„ = 0, 

fed- efO -?„) ifz„=i, 

ifz„ = o 

This is a model for recognition learning described by Bush, Luce, and 

Rose (1964) They proved the following theorem 

ThMrem 4 The marginal means of the asymptotic distribution are gwen 

£(P.) = : 


Pn+1 — 


?n+l — 


• + ’ 


£(?«) = ■ 


where 


Vi+ b' 

V p /o, 




0 . ’ 




P = PriZ„ = 1) 

pointed out by Bush. Luce, and Rose (1964), these asymptotic 
model “nymplolos predicted by Luce’s choice 

more n 'loscd on entirely dilTcrcnt considerations Further- 

more, It IS easy to show that 

I - E(p^-\ ElqJ) 

F(P«) 1 — C(« ) “ 

mtemreta'tioreTr" "c ••"»«“"iv.ty" curve or “ROC ’ curve The 
properties of the a '^^i similarity indices and thus 

determined be it, presentations, whereas 6 is a bias parameter 

■he stimulus presemanrs as wer"'"’°" 
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It follows immediately from Theorem 4 that perfect asymptotic learnmg 
can occur m the present model only if — tja — ^ i nf 

there are no generalization effects and so once again we have a model ot 

the type descnbed in Sec 3 


4 3 Position Habits® 

If we find as n co, either that;7„ 0 and q„ 0 ^ ^ 

y„-v 1, we say that a position habit has developed Condmons unde 
which this behavior is predicted by the present linear ^ 

examined. For the position habit (0, 0) to develop. must require first 
ftat the point «>. 0) be_a/xedpom, of thejoce.. by^^ 

StTete toUhe pomt M Jalsortj po.n, which means that 
there Lst be a nonzero probability that for all « ^ f’- 

^ Similar remarks apply to ‘he Position habit (1, 1) 

In this section, only Conditions I, 2, and 3 
see at once that _ 

E(j,„»\p.^9n = 0) = lf’ 

Thus the points (0, 0) and (1, 1) are „ J„ard° has no effect 

a., = 0, and these — 

Whether or not this seems reasoname, 

We have 

fpn’^ Oii(l —/’«) 


Pn+l ■ 






4r„= 1. 
= 0. 
jr, = 1. 
x. = o. 

ifZ, = 1, A". = 1. 
■rz, = 1. A', = 0, 

ifZ, = 0, AT, = 1. 
■rz. = o, Ar. = o 


. . K..lrh RoskKS r»' •»oeu«x m dovtopin; Ihe proof. ■ 
•lam indebted lo l 
section 
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-E(Pn+i 1 /'n = ?„ = 0) = \p„ = q„ = 0) = 0, 

and 

^(Pn+l I = ?n = 1) = | /'n = ?n = 0 = 1> 

which means that the points (0,0) and (1, 1) are fixed points. It is also 
true, however, that 

•E(p.+i|j’„ = 0,?„= 1) = 0, 

■E'(?«+ij;>. = o,?„= i) = 1 , 

which means that the point (0, 1) is another fixed point. The point (1, 0), 
which corresponds to perfect learning, is not a fixed point, however, unless 
00 — = 0, this case, examined in Sec. 4.1, is not considered here. 

The next task is to show that the points (0, 0) and (1, 1) are absorbing 
points We have 

Theorem 5 For the linear model with *= 6*^ = 0^1 = 0*^ 0, 

Pr(Iim = 0) = 0, 
where '*'** 

= Pr(X, = = . . . = jr„ = 0 I Z„ Zj Z„), 

provided only that p„q, ^ I, % ^O.andP^l. 

Similarly, 

Pr(lim v»„ = 0) « 0, 
where "“** 

V„ = Pr(Zi= = ... = x„= 1 |Zi,Zj Z„), 

provided only that p^, q, ^ Q, 0„. Of, ^ 0, and P 0 
PROOF. Define random variables on the interval (0, 1) by 

.4„ = Z. + (l_zj(/_e„,). 

Then '^J=^-. + (l-Zn)(l-5J„). 

■f.. = PriX, = 0 1 ZOPrlX, = 0 [ Z. = 0; Z., ZJ 

^ . . . Pr(Z„ _ 0 1 Z, = JT, = . . . = = 0 ; Z„ Zj, . . . , Z.) 

- [I “ - 0 - Zi)?,)[l - Z,A^, _ (I _ Zj)Zf Jil 

Now let ’ ' - (I _ Z„)A;-tA:., ...A; ?.]. 

x = max(p„5rj, 

'^» = Z. + (1 -ZJ{1 -ffy 



SINGLE-PROCESS LINEAR MODELS 


^ 9 ^ 


Then 

^ (1 - •')(! - . . ■ (1 - 

= -^1 ('5^“)' 

“Tro = 1 

^ IT (1 - 

Taking logarithms, ^ ^ 

— log 2 — 3 

Interchanging the order of summation, 

—log 

The expectation then satisfies ^ 

£(— log ^„) ^^2 J 

Now the X. ete random varmbles defined on (0, 1) and so the valnes of 
the TT, are in (0. 1) Therefore 

£W) :S -EW = ’> ^ 

However, the are independent random variables because the Z„ are 
assumed independent Thus 

= £(d,d-_i ... -5-0 = E(^' W.-0 . ■ • J 

= [P + (I-EX1-W. 

and I — IP + (1 — PXl ~ 

2 ECir/) ^ = r- [P + (I - E)(> - 

‘■° 1 - rp + fl - PXl - <')!' . 

- (1 - P)0 


Define 

Thus 


Thus we have ^ ^ 

E(-log^0 ^,2 J (■ 


- IP j- ft - PKl - g)!' ] 

(1 - P)5 > 

. _ IP u- ft - PKl - g)r 

= [-log(l— s)l (1— P)9 
-~loc(l — s) 


and taking Imiiis. i«« a4 ’i< t 

hm £(—*05 S (I _ /»)5 
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The requirement that implies that s ^ I, the requirement that 
0*0 5^ 0 implies 6 7^0 We have also required that F I and so the 
nght-hand side of the last inequality is finite 
We conclude, therefore, that as w — > co, the expectation of —log <f>n 
IS finite and so the asymptotic distribution of can have no density at 
zero This proves the first part of the theorem The second part is proved 
by a parallel argument Q E D 

Theorem 5 says that, with probability 1, the probability of an infinite 
sequence of all Tq responses and the probability of an infinite sequence of 
all Tj responses are both positive, provided the parameters and starting 
values are appropriately bounded away from 0 and 1 Thus the points 
(0, 0) and (1, 1) are absorbing points as asserted 

Finally, it will be shown that the point (0, 1) is not an absorbing point 
We define a random variable 


which has the value 1 if = 0 when Z„ = 1 or if = 1 when Z„ « 0, 
It has the value 0 otherwise Thus 

Pr(Y, - 1 1 ZO = (1 - - Z„). 

which goes to 1 if goes to (0. 1) For the point (0, 1) to be an 

abs^bing point, we would need to show that the infinite sequence 

^ ^ ~ ^ positive probability of occumne This is not 
true because of ^ j b 

"t" 0 

J™ PrCYi = Yj = = Y„ = 1 j Z,. Zj Z„) = 0 

PROOF 


PriY^-Y,^ = I'n = I I Z,. Z„ , z„) 

“^^(^> = i|2JP'<n = i|n=i,z„zo 

^Ki'n= il Y, = r, = _ 1 7- 7 

- 1(1 - A)Z. + y.(l - ZJIKI _ _ zj] 

[{1 -/>i)Z„ + 5.,(i _zj] 

Now let 


,Zn) 


Then 


r = mM(l I 

(Cl -/’JZ, + j,(l -Z,)]^ ,, 
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and so 


Pr{Yi = rj = . 

In the limit as n co, - 


. = 


I Zi, Zzf 


,Z„)^r 


*■ 0 and so this probability tends to zero. 

Q.ED. 

It is not difficult to show that no other absorbing points besides (0, 0) 
and (1 n exist and so we conclude that a position habit wiU develop with 
^labilityTlf nonreward has no effect but both direct and generalized 
rewards have some effect. 


5. SINGLE-PROCESS BETA MODELS 
Ltensively by Luce (1959), Bush (1959), 

Kanal (I962I 1962b). In this section, it tscS.n 

learning within the framework of single-pr 

^'xhe learning operators that are applied to the response probabilities 
have the form 

AiiL 


Tm = 


fi„u -Kl - U) 


ir /? *> 1 then the corre- 
where the /S„ are nonnegative J/V < 1. ,t decreases it. 

spending operator increases its opera , 

Thus Condition 1 of Sec. 2.2 requites that 


^.1 S 1 , ^ '■ 

^ 1. fito ^ *• 
i 1. ^ '• 


Furthermore, Condition 2 of Sec. 2.3 requires that 

fti S; ft*. ft" S: ft* . 

ft*, ^ ^ 

or more briefly. ^ fori,/ = 0.1- 

This foUows also from iximmal of Sec. 2.3. 
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5 1 Asymptotic Properties 


Without imposing further reslnctions on the beta model operators, an 
important asymptotic result can be obtained 

Theorem 7 If Conditions 1 and 2 are imposed on the beta model operators 
*/ > ^tjfor i,y = 0, 1, then lim p^ — \ or lim = 0 or both 

PROOF If we let n-co 

,, . Pn q„ 

= : , U„ = — 2 »» — 

1 - P. 1 - 


the beta model operators lead to 


Now let 
Then we have 


n — Aa, = ^,*a„ 

"n (1 - 9„)P„ 


Pn 


A* ^ Ay fot each i and j, and by hypoth- 

««. Thus r,„ < ,, for aU » and so hm r„ = 0 From the 

deflnmon of r„, .t follows thatp„ - 1 or 0 on'oth Q E D 

S^fte bn a' “““ distnbution lies entirely 

of thefa aC r' ‘‘ ‘ density appears m the intenor 

detail we shall np H t explore the asymptotic properties in more 

uemma 4 For a random walk defined by 




+ Ol 

Xn — Gj 


With probability /», 

U//A probability 1 — P, 

1/(1 + 1 prouded that P> 

« T T,„ - r. _ o, Th,„ ^ ^ ^ ^ 

n„ = r. + a, |z, - a, |(1 _ zj 
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The law of large numbers assures us that with probability 1 as n -» co. 


Thus, as n eo, the coefficient of n m the 

and 00 with probability 1 provided a^P > a^il ) 

defined on the Me real hne and let Y, and A be finite Then. 

(,) (/ r„ _♦ 00 H ilh probability 1 and if for all n, 

y„+, - T.*n ^ I'- - 

then i„ ->■ 00 Kith probability 1, and 

(n) if Y.,-r —ao with probability 1 and if for all n, 

n+. - A+i ^ 

then i„-» -00 with probability 1 
PROOF If we sum both sides of the inequahty 

r„« - 

from n = 1 to m - 1, we obtain 

n-A. 

^ y — Tj + 

Thus, if A and A are finite and if Y “f ^t'La’r 

i„-»ooalso The proof of the second part of the lemm 

The first of the '™™4'InVsmws (I960) The second 

an analysis of the beta J^^ar to the ole used by Lamperti and 

IS used in a “comparison methou, siroiia 

1 imposed, we hare with probability 1, 

r,i „ _ 1 If any one of the followmg sets of conditions holds 
" 

f hi<Pw 
^ 
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(lO Pn-*^ \f one of these four sets of conditions, with all in- 
equalities reversed, holds, 

0^0 f one of the following sets of conditions holds 



A*i<A*o. 

^00 ^01* 

^ ^ P*o> 

(*') 

A*o < 

^10 ^00* 

^ ^ Poi> 


A*i < A’o. 

fiol ^ ^0D> 

^>PiV 

(dO 

A’ < A*i, 


-P > Po*o. 


0 f any one of the sets of conditions in ( 111 ), with all in- 
equalities reversed, holds, where 

1 

, P » 

1 -f Pa 

->og^o/ 


1 

1 [ log 
“log^e: 


PROOF Let L 
requires that 


described by the random walk 


log — logit/?„ and i’rt — Condition 1 

:o > 1 and ^ 01 , ^00 < 1 The process can then be 


■t-n + 611 

ifZ„ =1, A-, = 1 

+ &10 

irz„=i. x„ = o 

~ hf,i 

■fZ, = 0, A'„=l 

“ ^00 

ifz„ = 0, x„ = o 


Now define the following four comparison processes- 


M, + b„ 
— *00 
f«n + *|o 

K - *0. 

Wo-*.. 
P. + *.0 
Wo-*«, 


‘f^o = I, 
if Z„ = 0, 
lfz„ = 1, 
■f Z„ = 0, 

■f z. = 1, 

ifz, = 0, 
■fZ, = 1, 
If Z„ = 0 


C.„ 
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From Lemma 4 and the definition of we see that as « — k>. with 

probability 1, ^ ^ 

CO if P> Pio> 


Bn 

Cn 

■D« 


► CO if P> Pou 

► 00 if P > Pin 

► 00 if P "> poo 


Furthermore, we see that 

(B„« - i.+i) <; fl. - ^ 

(c„+.-x.„+o^ Cn-i- 'f 


and ^01 ^ ^00 
and ^00 ^ ^oi 
and ^00 ^ ^01 






Thus, part (0 of Lemma 5 =‘"‘* « powd m the same 

to part (i) of the theorem Part (ii) of P similarly 

manner by reversing aU inequalities Parts (iii) and fivy ar 

proved for the {?„} process i Cec 2 is imposed, but 

Theorem 8 is valid when only Condition 1 of Sec ^2^.s imp 

Conditions 2 and 3 of that section pa 

elusions First. Condition (iv) are impossible 

corresponding statements of P ^ ^ \ ' on Conditions 1 and 

Furthi;more.itfoUowsfromLomma ^ 

2, that pu < Pn end Poo < Poo 
(rt Pn < Poo end pfi < pjo. 
or 

(u) Pn > Poo and pf, > Poo 

These conclusions lead to the ,hat one of the following 

Lemma 6 Conditions 1, 2, andio] bee s. h 
orderings must hold ^ 

Pn ^ Pn ^ Poo ^ Po^’ 

Pn ^ poo ^ Pn ^ Poo^ 
poa^ P^o^ ^ 

Pm ^ ^ ^ . , 

, and Theorem 8, the following corollary is readily 


(n) 

(h) 

(e) 

(d) 

From this lemma ; 


proved 
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Corollary I If either ordering (a) or (b) of Lemma 6 holds, then with 
probability 1, 

Pn~~rl if P 1> Poo, 
p„-^0 if P<ft„ 

if P >/>„•„, 

?.-*0 ,/ P<p*, 

whereas if ordering (c) or id) of Lemma 6 holds, then with probability I, 
p„-»l if P>ft„ 

P.-O ,/ P<p„„, 
if P>pt„ 

?„-0 ,/ P<p.„ 

These conclusions provide us with a more complete picture of what 
happens asymptotically Figure 6 depicts the results for the four orderings 
listed in Lemma 6 As can be seen, the position habit (0, 0) develops when 
Pis less than both and poo, and the position habit (1, 1) develops when 

P 15 larger than both pf, and pj, 
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SINGLE-PROCESS BETA MODELS 

It should be stressed that, because of the method of proof used, the 
conditions that derive from Theorem 8 are only sufficenl conditions 
Thus the question marks in Fig 6 indicate our present state of ignorance, 
they do not mean that p and q tend to values other than 0 and 1 In the 
middle regions of Figs 6a and 6c. we know from Theorem 7 that/) 1 
or ? 0 or both, but Theorem 8 is silent about which actually occurs 


5 2 Equal Presentation Probabilities 

In most, if not all, published papers on f 

the two stimuli are presented with equal probabilities Thus the specif 
case of P = i with the present model ,s of interest In particular we wouW 
like to find conditions, if they exist, for the development f ^ 

when P = i Such conditions can be obtained directly from Corollary 1 
and the definitions of p./ and p* We have ,, p = i 

Corollary 2 If ordemg (,a) or (,b) of Lemma 6 applies, then nhea P i 

the position habit (0, 0) develops provided that 

^iifti < 1 < '■ 

and the position habit (1, 1) develops provided that 

Mx > • ^ ' 

If ordering (c) or W occurs and P = i. then the position habit (0, 0) 
develops provided that 

Aofto < 1 ft'” Wo < ’■ 

nhereas the position habit (1, 1) develops provided that 

AiAi > 1 > ’ 

A r?ir nrevents the several conditions in this 
Nothing we jll, the next section, they imply 

corollary from holding but a , , , 

an asymmetry in the experimental situation 


5 3 Symmetric Operators 

When the experimenla.si.ua..on.s»m^^e..y^ 

impose Condition 4 of Sec 

considered here, that i 

A*A = <. Mi-^. 
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and that 

Pn + Poo “ U Poo + Pii ~ 1- 
Furthermore, Lemma 6 and Condition 4 yield 

Lemma 7 Conditions 1, 2, 3, and 4 of Sec 2 require that one of the 
following pair of inequalities holds 


w 

Al^Ol ^ 1 

and 

^10^00 ^ If 

(b) 

^ll^Ol ^ 1 

and 

^10^00 ^ If 

W 

1 

and 

fto^oo ^ ff 

w 

^11^01 ^ 1 

and 

Ao^oo ^ I 


Note that orderings (fc) and (d) of Lemma 6 imply identical conditions 
on the when Condition 4 is also assumed 

From this lemma, we can see at once that none of the conditions of 
Corollary 2 can hold when Condition 4 holds Thus, when | and 
Condition 4 is met, we have found no conditions for which position habits 
evelop This does not prove that they cannot develop, however, because 
we are dealing only with sufficient conditions, not necessary ones On the 
other hand, it is clear from Figs 6b and 6d that when + pJo * ^ 

Poo + p,i « I, then with probability 1, — 1 and q„-<^0 From Figs 

6a and 6c, no such inference can be made, however, presumably because 
our sufficient conditions are not strong enough 


5 4 Experimenter-Controlled Operators 


The single-process beta model is greatly 
reward and nonreward effects are equal as 
Sec 2 6, this implies that 


simplified if we assume 
specified by Condition 


that 
5 of 


and 


Furthermore, 


- A, = A. Pi, = = p;, 

A, = /’» = A. A*i = A*d = A*. 

Pn = p«,= p, = p. 

Conditions 1 and 2 require that 


Corollary 1 simplifies to P^P • 

Corollary 3 tfCondmam 1, 2, 3, and 5 arc met, then Kith probabdll) 1 
* f P> p, />,•-*■ 0 ;/ P < p, 

1 // P > p*, tf p<^p* 
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This result completely specifies the asymptotic distribution for all values 
of P except i* != p and P = />*, values of P that are not likely to be of 
serious interest Figure 7 depicts this result If the symmetry condition 
IS also imposed, then p + p* = I and so if p 5*5 p*, — ► I and 0 

when P = i Only if the experimental situation is asymmetric can position 
habits develop when P — \ It is clear that when F = J, the position 
habit (0, 0) can develop if p > J, which implies that < 1, which, in 
turn, implies that the direct effects of 


presentations are weaker than the gen- 
eralized effects of jo presentations Simi- 
larly, when i* = 1, the position habit 
(1» 1) can develop if p* < | that is, if 
^i/^o > 1. which means that the direct 
effects of Jq presentations are weaker 
than the generalized effects of pres- 
entations Either situation suggests a 



0 p p* 1 

p 


Fig 7 Impljcationj of Corollary 3 
for the expenmenter controlled beta 
model 


perceptual asymmetry a strong bias 

toward one of the stimulus presentations On the other hand, the model 
predicts that position habits develop in symmetnc experiments provided 
P is sufficiently small or sufficiently large 


6 CONCLUDING REMARKS 

The analysis of single process models, presented in the preceding 
sections, has not led to a satisfactory, parsimonious theory of identification 
learning for the following reasons The nongeneralization type of model 
defined in Sec 3 implies an independence of stimulus presentations that is 
very hard to believe rn spite of the lack of relevant data The linear models 
of Sec 4 provide a not unreasonable description of human recognition 
experiments that use confusable stimuli, but they cannot predict the 
development of either perfect learning or of position habits without rather 
implausible restrictions on the parameters Perfect learning cannot occur 
unless the effects of reward do not generalize from one stimulus presen- 
tation to the other, position habits cannot develop unless nonreward has 
no direct or generalized effects The beta models of Sec 5, on the other 
hand, predict quite plausible asymptotic behavior for experiments with 
nonconfusable stimuli, reasonable restrictions on the parameters and on 
the presentation probability P can lead to either perfect learning or 
position habits or both Howetcr, the beta models cannot predict the 
sort of data obtained in human recognition experiments unless »ef) 
reasonable restrictions (Conditions I, 2, and 3) are eliminated 
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It appears that if singlc-proccss models arc to be taken seriously, 
something like the linear models must be used when the stimulus presen- 
tations are confusable, but beta-hke models arc more appropriate when 
they are not confusable. This lack of parsimony is hardly tolerable The 
general conditions (Conditions I, 2, and 3), which lead to the bulk of the 
conclusions reached, are not easily removed from any sensible theory, 
however The alternatives are two (1) dual process models and (2) 
Markovian-type models such as those described in Sec. 5 of Chapter 10 
or in Chapter 18 of this Handbook. 

An experimental observation, not mentioned previously in this chapter, 
IS the so-called facilitation effect of overlearning* It is often observed that 
overlearning increases the rate of relearning In the numerous experiments 
on this phenomenon, various measures of learning rate have been used, but 
it seems evident that when facilitation occurs, the relearning curve of 
overlearned animals must cross the relearning curve of nonoverlearned 
animals The overlearned ones start relearning at a lower probability of 
being correct, but must overtake their control group if facilitation, how- 
ever defined, is said to occur. It has not been proved, but it appears 
evident, that no single-process model, of the type defined in Sec 2, can 
predict such a facilitation effect Perhaps this is reason enough to reject 
such single-process models, but, as usual, the experimental evidence is not 
overwhelming, facilitation sometimes occurs and sometimes does not 
A single unified formal theory of identification learning IS yet to come It 
must specify the conditions under which imperfect asymptotic learning, 
perfect learning, position habits, and the facilitation effect occur m a 
variety of human and animal expenments 
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Concept Utilization 


The purpose of this chapter is to present an overview of the mathematical 
treatments of concept utilization which are currently appearing in the 
psychological literature There are four treatments and they can be sorted 
into four distinct categories concept utilization as paired associate learn 
mg as cue conditioning, as strategy selection, and as a combination of 
selection and conditioning 

In the typical concept-utihzation experiment, a number of stimuli are 
presented to the subject, either singly or m certain combinations, who then 
makes one of a number of well de^ed responses Such situations differ 
from ordinary paired associate learning m that (1) the number of allowable 
responses is less than the number of stimuli, and hence at least some of the 
responses must be given to more than one stimulus, and (2) stimuli 
associated with the same response are related m some way For example, 
consider eight stimuli, each of which consists of a different combination of 
two levels each of form, color, and size, and let the response f?, or be 
correct whenever the form circle or square, respectively, is presented, 
regardless of the form’s size or color The four stimuli associated with J?, 
have circularity m common and the four associated with Rz squareness 
The subject s task is to leam to choose between Ri and Rz according to 
the form presented Concept utibzation expenments are thus studies of 
choice behavior (see Chapter 2) under conditions where the stimuli on 
which choices are based are catcgonzable in more than one way 

Consider in more detail the stimulus situation just desenbed Form is 
a dimension on the basis of which all the stimuli of the problem can be 
categonzed, that is each stimulus is clearly either a circle or a square, 
similarly for the color and size dimensions Squareness and circularity 
are values on the fonn dimension, specific values on a dimension are 
often referred to as cues of the stimulus situation Generali) speaking in 
concept utilization experiments the dimensions are obvious and the values 
within a dimension are readily distinguishable the subject is not required 
to engage in any stimulus differentiation learning 

If as before, Rt is correct whenever the stimulus is circular and Rg is 
correct whenever it is square, regardless of color and size, the form 
dimension is said to be relevant and the other two irrelevant When one 
binary dimension is relevant, regardless of how many other dimensions 
there may be, the subject has a two-choice problem the subject must 
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choose between and R^ If form and color arc both relevant so that 
four responses must be appropriately identified with the four form-color 
combinations, the subject has a four-choice problem In certain concept- 
utilization problems, both form and color may be relevant but redundant, 
for example, squareness and redness may always appear together. Here 
the subject has a two-choice problem where response choice may be based 
on either form alone, color alone, or both simultaneously In any 
situation, however, all that is required of the subject is that on presentation 
of a stimulus he choose one from a number of responses The subject 
knows from the start what the possible responses are, furthermore, all 
responses are perfectly available, that is, the subject need not engage in 
any response learning The final characteristic to note about responding 
IS that all the responses specific to a given problem are utihzed in critenal 
performance 

Thus, whatever concept utilization is (paired-associate learning, cue 
conditioning, strategy selection, or some combination), theorists and 
investigators do not conceive of it as including either stimulus-differ- 
entiation or response-learning processes 


1 COMPONENT FORMULATIONS 

The seemingly standard division of psychological thinking into that 
concerning the stimulus situation as it is effective for a given organism, 
the processes that may be inferred of the organism, and the organism’s 
actual overt behavior suggests that mathematical treatments of behavior 
might conveniently be studied in a similar fashion In this chapter, an 
attempt is made to organize discussion of each of the treatments of concept 
utUization according to this tripartite schema, thereby isolating as much 
as possible the assumptions and implications pecuhar to each of the three 
trw^ments ^ ^ picture and facilitating companson among the 

preliminary discussion of the type of situation to which the 
how the ^ ^PP*y pins the assumptions which specify 

nre^emed 1 ‘he various aspects of the 

formlhof Th “ ‘he stimulus component of the 

(cZp^Jor^eL^rrr hypothesiLd processes 

formed how th ^ ° organism such as how connections are 

ehanS as a resulf ofTT ‘h= 

together called ih ^h®‘ is meant by reinforcement are, 

potent ;sle ' “mponent Finally, the response com- 

P the set of assumptions, often a single statement, which tells 
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how the vanous conditioning states are translated into some response 
measure 


2 CONCEPT UTILIZATION AS 
PAIRED-ASSOCIATE LEARNING 

In the standard paired-associate (PA) task, where the pairs Si-Ri, , 
to be learned, three distinct processes may occur First, in 
order successfully to form the pair 5',— must be distinguishable from 
the other stimuli, thus, to the degree that the stimuh are similar among 
themselves, the subject is faced with a stimulus differentiation (discrimi- 
nation) task Second, m the event that the responses are either meaningless 
verbal units or never-before practiced actions, the subject, m order ever 
to score a correct response (CR), must acquire the ability to make the 
response This process is the response learmng phase of the familiar 
two-phase conceptualization of learning The third process in the PA 
task IS that of actually forming associative connections between the 
stimulus and response members of the pairs It could be safely argued 
that the first and third processes, stimulus differentiation and response 
learning, are independent, however, there is substantial evidence which 
demonstrates that the second process, association formation, is dependent 
on both the other two Mathematical formulations of learning treat only 
the association formation phase, theimportance of stimulusdifferentiation 
and response learmng in the over-all process is reflected m the general 
learning-rate parameter, if not specifically assumed to be absent 
Bower (1961, 1962), m an application of stimulus-sampling theory to 
PA learning, assumed that each S( can be represented by a single element 
and that this element is either conditioned or not eondifroned to j?, ffe 
further assumed that the single element representing Sf is sampled on 
every trial without fail and, if it is not already in the conditioned state, 
becomes conditioned to with probability d The probability of a CR, 

Hi to Sf, IS unity if the element representing St is in the conditioned state, 
otherwise, when there are N responses, R„ the probability is 

assumed to be l(T/ These assumptions can be listed according to the 
component processes to which they apply as follows 
Stimulus axiom £ach stimulus item u represented b} a single element 
Hhich IS sampled mth probability I on every trial 
Conditioning axioms 

(i) An element can be m either of (no states Q, not conditioned, or 

Cl, conditioned 
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(ii) On each reinforced^ trial, the probability of a transition from Co 
to Cl IS a constant 0. the probability of a transition from Ci to Co 

IS 0 

(in) Initial condition all elements arc in Co on the first trial 
Response axiom If the element is in C®, then the probability of a CRis 
\IN, ^\here N is the number of response alternatiies, if the element is in 
Cl, then the probability of a CR is I. 

Several brief comments are in order. Regarding the stimulus axiom, 
note that the single element assumption is a determining factor for the 
conditioning axioms, for if two elements were assumed, each stimulus 
Item would then be associated with one of three (instead of two) con- 
ditioning states Co, both elements not conditioned, Ci, one element 
conditioned where the two elements are not distinguished, C 2 , both 
elements conditioned Such a formulation yields some interestingly 
different statements 

An interesting point is that even m data used to support Bower’s 
model, the stimuli are clearly not unitary In one study they were pairs of 
consonant letters, and it is known that subjects often use only the first 
letter of a nonsense syllable (when it is a stimulus) m forming asso- 
ciations This makes one suspect that the value of the one-element 
assumption is primarily its role in determining the number of states in the 
conditioning model and not its verity as a statement about the stimulus 
situation 


The assumption that the single element is sampled with probability 1 
on every trial is probably reasonable in PA learning, but the subject’s 
occasional failure to observe the stimulus is certainly possible, and should 
be recognized as a source of error in a poor fit 
With respect to the conditioning axioms, to say that 0 is constant from 
trial to trial is the same as saying that prior to the critical event, the shift 
from Co to C^, there is no accumulation of response tendency and that 
lollowmg the critical event there is no further modification This means 
that the trials preceding the cntical event are independent events and that 
the trial number of the critical event is independent of both the number 
preceding trials Note that specifying such a 
^ ^ ^ departure from the strict contiguity principle 

the * original stimulus-sampling formulations There, m 

probability^^ dion, any element sampled became conditioned with 

the interest is the matrix of transition probabilities which restates 

loning model The entries give the probability of moving from 


Rcmforcemcnl refers to the contiguous presentation of stimulus and response members 
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the State listed at the left to the state listed above 


P = 



Cl 


Each row sums to unity, since if a subject is in, say, Cq, then staying in 
and moving to Q exhaust the possible outcomes of the trial The mat 
P gives the transition probabilities for trial n to trial /i + 1 for any n, 
gives the transition probabilities for trial n to trial « -f 2, the probabilit 
of moving from one state to another in exactly two trials, and gi\ 
the probabilities of having moved from one state to another in exactly 
trials 

Cl 



Thus with probability (I — 6)", the subject will still be in Cq after n trials 
In all transition matrices * 1, 2 , the transition probability o 

Cl to Co IS zero, as stated in the conditioning model In the language o 
Markov processes, Ci is called an absorbing state 

Regarding the response model, to say that Pr{CR J Cq) « IfN is to say 
that (1) the subject must respond on every trial with one of the responses 
from the list, a requirement which in turn requires that (a) no response 
learning is necessary and (b) all the responses can easily be made in the 
time allotted, and (2) when an element is in Co each of the N response 
alternatives is equally likely to be chosen With regard to the latter 
implication, in some studies integers are used as responses, but it is known 
that of the integers 2 through 9, the odd integers are more likely to be 
guessed than the even ones, hence a possible cause of a poor fit 

The foregoing model of PA learning is referred to as the one-element 
model and is discussed by Atkinson and Estes m Chapter 10 of this 
Handbook where they present derivations within the model of several 
statements about the PA situation A statement not denved by them, 
namely, the number of mats before the critical event occurs, will be 
derived’ here as an illustrative example 

Let nJI be the expected number of tnals the element remains in Co Then 
to calculate Til from the model, simply sum the weighted trial numbers 
12 3 . where the weight for a given Inal number is the probability 

that the critical event occurs on that trial For example, the weight for 
Inal 2 is (1 — 0)0, the probability that the element did not shift as a result 
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of the first trial but did as a result of the second, thus making the second 
term in the sum 2(1 — 6)6 


no = 10 + 2(1 - 6)0 + 3(1 - dfe + 
= f n(l - 

n-l 

= 02 n(l-0r-" 

-"(a 
_ 1 
~e 

It remains to show that 


2«(i -o)"->= r ! 1 = i 

To do this, first note that 




the sum of a geometric sequence, then by taking the derivative with 
respect to 0 of both sides, we obtain 


-2n(i-er->=_i 

or 9“ 

in(l - 0)"-> = 1 

n-l pi 

that Wo\rnl!^ ii~K ^ ~ ^ Thus the one element model predicts 

dls nt m "oves from C„ to C.. 

ability of an cr course, that 1/0 errors are expected The prob- 

1/0 tnaTs nriorT is 1 - (1/fiO, hence of the 

plrd^^;e^r.„tn ""rfor''’ '■ - W^K./0) of them are 

stimu!u?s,fuat‘mV„''r Bower’s one-element model to the 

be made ConsiHr th task, an interesting choice must 

y I. relevant What arc the in this situation? If A, and 
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As are the two values of A. it is possible to take = Ai and Sz = As 
and hence construe the concept-utilization task as that of forming asso- 
ciations between the two responses A and Rz and the two values of the 
relevant dimension Ai and As According to this view, the PA task 
would consist of learning the pairs Ai“A and Az’A On the other hand, 
since there are eight actual stimuli in the problem, it is possible to des- 
ignate each as a distinct stimulus and therefore construe the concept- 
utilization task as learning the following eight pairs S,-Ri, for i ~ 
1, 2, 3, 4, and / = 5, 6 , 7, 8 , where S^, , are instances of 

Ai and 1 S 5 , . , Sg are instances of As 

Since Bower’s one-element PA model is already known to work quite 
well when the specifications of the response model are met (see Fig I, 
Chapter 10 of this Handbook)y and since it can be applied twice to any 
concept-utilization situation, once using Ai and As ^be stimuli and 
once using the eight instances as stimuli, it seems reasonable that the model 
provides a technique for distinguishing the contributions of straight PA 
learning and concept utilization to over-all performance on the task If 
the model fits, for example, when A. and Aj are taken as the stimuli but 
does not fit when the eight instances are taken as the stimuli, then perhaps 
It IS safe to conclude that the subjects were conditioning A snd A to 
conceptual aspects of the stimulus situation and not to specific instances of 
those concepts The task of the rest of this section is to argue that such a 
conclusion is not warranted 

Suppes and Ginsberg (1962) conducted just such an experiment and 
applied Bower’s one-element PA model twice in the manner outlined 
Using the specific instances as stimuli, they found Bower’s model to fit 
excellently, however, using Ai and Az as stimuli, the fit was very poor. 

In view of this, they suggest that perhaps learning was primarily PA 
learning of individual instances and that the concepts (Ai and Ae) were 
of only limited value 

In order to fit Bower’s model using Ai and A* as the stimuli, 
satisfaction of the stimulus axiom requires that all the instances of Ai be, 
for the subject, indistinguishable among themselves and identified with 
the single cue denoted by Ai However, the stimuli of a concept utiliza- 
tion problem are deliberately selected so as to be unambiguously categonz- 
able on all the dimensions involved This means that if the specifications 
of the experimental situation are met, satisfaction of the stimulus axiom 
IS doubtful Thus an interpreter of the data must consider the additional 
alternative that the model is not appropriate This does not constitute 
an argument to the effect that concept utilization cannot be usefully 
viewed as concept-wise PA learning, rather it constitutes an argument 
to the effect that Bower's one-clcmcnt model cannot be used to decide 
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which process best represents the facts, the learning of responses to Ai 
and Diz or to instances of and 

A final point of interest will be considered regarding the two-way 
application of the one-element model to a concept-utilization task On an 
intuitive basis, it would be expected that if the model fits using individual 
instances as stimuli, it will also fit using the as stimuli [The latter 
goodness-of-fit also depends on homogeneity (with respect to ease of 
learning, not distinctiveness of stimuli) of the pairs associated with Du ] 
This can be seen by considering how the model is applied when using Du 
as stimuli Since all the instances of D^^ are taken as identical, the error 
score for Du is the combined number of errors for the instances of Du 
Thus if there are four instances of Dj,, a single trial for Du involves four 
stimulus presentations, one for each instance, and hence the expected 
number of errors per trial for Du is the combined number of errors per 
trial for each instance of Du This means the learning rate when the Du 
are taken as the stimuli should be one-fourth of that when the individual 
instances are taken as the stimuli Suppes and Ginsberg, m fact, report 
such a relationship The actual mechanics of applying Bower’s model 
directly to the concepts (the Di/s), then, is seen to be a matter of selectively 
blocking the trials and then calculating a new rate parameter for the 
blocked data 

Suppes and Ginsberg’s application of Bower’s one-element model 
appears to be the only mathematical attempt to treat concept utilization 
as a orm of PA learning It has been argued that the model is inadequate 

or t IS purpose, this is because it involves an oversimplified representa- 
tion of the stimulus situation Works by Bower and Trabasso (1963), 
using a combination of concept selection and PA learning, will be 
considered later 


^ UTILIZATION AS CUE 

CONDITIONING 

a T ’'""'"E. 'h' of 

coTODt be represented by a single element In 

cSered? .'l'*''™'"*'' ‘be st.mul, are del.berately 

mored.mensmn? P “"Ambiguously classifiable onto each of two or 
involved m c example, if form, color, and size are the dimensions 

accordine tn r “pnnment. every stimulus is jointly classifiable 
°a^u« ™a ""r’ '““h dimension has two 

color and Href dimension, red and green on the 

E n small on the size dimension A particular stimulus. 
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say a large green circle, then has at least the three attnbutes imposed by 
the expenmenter largeness, greenness, and circularity However, note 
that other stimuli share some of these attnbutes, for example, a large 
green circle and a small red circle both possess circulanty A shared 
attribute, such as circularity, is called a cue Thus in an ideal situation 
where the only cues are those arising from the three dimensions under 
discussion, cues and stimuh may be distinguished by a complete listing 
of each, as in Table 1 

Table 1 Complete Listing of Cues and 
Stimuli Associated with the Three Binary 
Dimensions Form, Color, and Size 
Cues Stimuli 


Squareness 

Large red square 

Circularity 

Large red circle 

Redness 

Large ereen square 

Greenness 

Large green circle 

Largeness 

Small red square 

Smallness 

Small red circle 


Small green square 
Small green circle 


Briefly, a stimulus is whatever is presented to the subject (for example, 
a large green circle), and a cue (for example, greenness) is a value on a 
dimension and hence something shared by more than one stimulus 
Some writers call such cues concepts, thus referring to all stimuh possessing 
the attribute greenness, say, as instances of the concept green A theory 
about cue conditioning, then, is a theory about the effectiveness of 
dimensional values 

Bourne and Restle’s (1959) cue-conditioning model has as its basic 
elements the cues of the stimulus situation They treat an abstract set 
AT of cues which includes not only those cues built in by the experimenter 
(squareness, redness, etc ) but any other cues, either environmental or 
internal, to which the subject might be sensitive As will become apparent, 

It IS not necessary to know what the elements of K are or even to know 
whether K has very many or only a few elements The utility of the model 
arises from the fact that whatever the contents of K may be, they can be 
operationally divided into two kinds The operation is the experimenter's 
reinforcing procedure, and the result ts a partitioning* of K into two 

*X IS said to be partitioned into the two subsets A", and A, if and only if (1) A « 

/C, u A, and (2) A, r» A, « ^ In other words. A, and A, must be (I) exhaustive and 
(2) mutually exclusive 
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subsets relevant and irrelevant cues A relevant cue is one to which a 
particular response is either always correct or never correct, that is, it is 
reinforced with probability 1 , an irrelevant cue is one which is reinforced 
only randomly (probabihty \ in the two choice situation) The condition- 
ing model of Bourne and Restle’s formulation then treats the two types 
of cues differentially irrelevant cues are assumed to become adapted 
and unadapted cues are assumed to become conditioned 
The component axioms of Bourne’s and Restle’s cue conditioning 
formulation may be listed as follows 
Stimulus axioms 


(0 The stimulus situation may be represented as a set K of cues 
(ii) The set K is partitioned by the experimenter's procedures into two 
subsets relevant and irrelevant cues 

Conditioning axioms Twoprocessesare assumed to operate simultaneously 

(i) Conditioning on each reinforced^ trial, a constant proportion 0 of 
the unadapted, unconditioned cues become conditioned cues already 
conditioned remain so nonreinforcement produces no change 
(n) Adaptation on each reinforced trial a constant proportion B of 
the unadapted irrelevant cues become adapted irrelevant cues already 
adapted remain so nonreinforcement produces no change 

{flote the conditioning and adaptation rates are assumed to be equal, 
that IS, 0 IS the rate parameter for both processes ) 

Response axiom The probability of a response equals the proportion of 
unadapted cues v,hich are conditioned to that response 
The statements of both the conditioning and the response axioms may 
be written as equations, thus permitting the use of the probabihty calculus 
for deriving new statements Turning first to the conditioning axiom, let 
K be any cue in K and let F{k, n) be the probability that k is conditioned 
on trial n Then 


T'(/r, n + 1) = F{k, n) + 0[1 - F(k, «)] (0 

II!!! immediately from the conditioning statement, m 

* * 1 , ^ ^i\ clement k conditioned on trial n 1, it must have been 
^ ^ conditioned on trial n and remained so or (2) not 

conditioned on tnal n but conditioned as a result of reinforcement on that 
ire alternatives exhaust the possibilities and since they 

are mutually exclusive events, their separate probabilities add to give the 

heinR knowledge of wtielhcr ihe response was correct or incorrect 

“ "■"'"S “"d ISinB informed lhat a correct 

rwpotisc ij correct are equally rtmforong 
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probability of k being conditioned on trial « + 1 The former alternatn 
has probability F{ki n) 1, the joint probability of k being conditioned o 
trial n and remaining so, the latter has probabihty 6[i — F(k , «)] th 
joint probability of k not being conditioned on tnal n, I — F{k, n 
and of Its becoming conditioned, d Equation 1 is a first order dilferenc 
equation, and can be solved for F(kt n) by use of the calculus of finit 
differences to give 

F(k, n) = (1 - er~^Fik, I) + [I - (I - ey-n (2 

Note that although F{k,n ■{■ 1) is a linear function of F{k n) in Eq 1 
f (fc, n) IS an exponential function of « m Eq 2 The conditioning mode 
is thus said to be a linear model since its transition law is a linear equation 
but the trial by-trial solution for F{ky n) is necessarily exponential 
If reinforcement is not given on trial n, then the equation representing 
the conditioning model’s assumption that nonremforcement produces no 
change is simply 

Fik, «+!)*= F(k, n) (3) 

The question now arises as to what form Eq 2 assumes when some tnals 
are reinforced and some are not, that is if reinforcement is given only on 
percent of the trials, how docs F(k n) increase as a function of tnals? 
This question can be answered as follows A response is either reinforced 
or not reinforced, and because these two events are exhaustive and 
mutually exclusive, their probabilities are combined by simple addition 
Therefore, since the former event occurs with probability -n and the latter 
with probability 1 — w, the probability of an element k being conditioned 
on trial n + 1 is 

Fik. n + 1) = 77{F(fc, n) + 0[1 - Fik. n)]} + (1 - 7r)FCfc, n). (4) 

or, after rearrar^r\g, 

Fik, n + 1) = Fik. n) + 77e[l - Fik, «)], (5) 

which intuitively is satisfactoiy the first term on the right hand side 
represents the case where k was already conditioned, hence it makes no 
difference whether or not reinforcement occurred, and the second term 
represents the case where k was not conditioned but becomes so with 
probabihty ttQ, the probability that reinforcement is given, v, times the 
probabihty that conditioning occurs 0 
Equation 5 is a first order difference equation, similar to Eq 1, and has 
the solution 

Fik, n) = (1 - TrOy-'Fik. 1) + [I - (1 - ffO)"-*] (6) 

Equation 6 then, is a more general form of Eq 2 that takes into account 
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the possibility of partial reinforcing schedules For 100% reinforcement, 
•TT = 1 and Eq 6 is the same as Eq 2 
The restatement of the adaptation portion of the conditioning model as 
an equation is identical with the foregoing treatment of the conditioning 
portion let k' be any irrelevant cue in K and let A {k\ n) be the probabihty 
that k' IS adapted on trial n Then 

A {k\ « + 1) = A{k\ n) -\-6\l~A «)], (7) 

which has the solution 


A{k\ n) = (1 - &r-^A{k\ 1) + [1 _ (1 _ (8) 

In the event that reinforcement occurs with probability tt, Eq 8 becomes 
A (fc', «) = (!- {k\ 1) + [1 _ (1 _ (9) 

The adapting rate parameter 0 is the same 0 that appears in the condition- 
ing equations, hence adaptation is assumed to progress at the same rate 
as conditioning 

A careful reading of the assumptions of the conditioning model reveals 
that an irrelevant cue may become adapted whether conditioned or not, 
thus It IS possible in this model for a cue (albeit an irrelevant one) to 
become unconditioned once conditioned That a once conditioned, now 
adapted irrelevant cue is now unconditioned follows also from the 
a^umption of the response model which says that adapted cues are not 
effective in determining response probability 
Turning now to the response model, let be the probability of a CR 
on tria n, then, translation of the statement of the response model 
directly into a ratio gives 


- y[(«-adapted) a fconditionedl] 
N{jn adapted) 


( 10 ) 


where ^^(i) and N(n x) denote the number of cues which are ■ V and 
not I , respectively Equation 10 is equivalent to 

1(1 - 01 "- 


= 1 - 


( 11 ) 


'• + (!- I-);! _ 0)"-' ’ 

“> ^ which arc relevant Dcnvation of 

tq llfromEq 10 proceeds as follows 

PortionrhTd ‘‘'“"’■"“I” of Eq 10 can be rewritten in pro- 
number may be ® ^ **'' whatever that 

Es adapt ed) a fconditioned)! 

R(n-adaptcd) 
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Since there are two kinds of nonadapted cues, relevant cues which cannot 
become adapted, and irrelevant cues which have not become adapted, 
i’(n-adapted) can be rewritten as 

P(relevant) + i*[(iiTcIevant) A {n adapted)] 

Let r equal i*(relevant), then /*(irrelevant) = 1 — r, and -P(n adapted) 
becomes r + (I — r)[l — A{k\ n)] This last expression is the denomi- 
nator of p„ The procedure for obtaining the numerator is similar 

P\{n adapted) A (conditioned)] 

=: /^[(relevant) A (conditioned)] 

-h P[(nTelevant) A (n-adapted) A (conditioned)] 

^ rF(k, n) + (1 - r)[l - A{k\ rt)]F(k', n) 

Note that in the two choice problem, the probabihty F{k\ n) that the 
irrelevant cue k' is conditioned on trial « is | Substituting this into 
the foregoing expression and then substituting the new versions of the 
numerator and denominator into £q 11 gives 

p rF(k,n) + (l-r)ii--A(k',n)U 
” r-h(i-r)ll-A(k,rt)] ^ ^ 

At this point two initial conditions are introduced (I) there are no 
response biases, that is, F(k, 1) = J , and (2) imtially none of the irrelevant 
cues IS adapted, that is, A(k'y I) *= 0 If these two substitutions are now 
made m Eqs 2 and 8, respectively, and the results substituted into Eq 1 Iff, 
/»„ becomes 

r[|(I - g)"-^ +1 - (1 - or-'] + (J - r)(l - g)"-H 
“ r + (1 - rXl - S)”-' 

which, by adding and subtracting (1 — 6)”~^ and rearranging can be 
rewritten as 

_ r 4- (1 - rKl - 9)’-* - Id - 8)"'* 
r + (I - rXI - «)"'■ 

This final form is now seen to be the same as Eq II 

The curve descnbed by Eq 1 1, which js sigmoidal, begins aip^ = i and 
approaches the asymptote » I This means that the concept-utilization 
problem of concern is assumed to be capable of complete solution with 
perfect performance 

Equation 11 involves two parameters, the proportion r of relevant cues 
in K and the learning rate parameter 0 Bourne and Resile make the 
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assumption that r = 6, thus reducing Eq 1 1 to 


Pn = 1 - 


Ki - er-^ 

0 + (1 - ey 


( 12 ) 


This simplification leaves one parameter, which is usually estimated by 
means of a formula involving the mean number of observed errors E 
The formula is straightforwardly developed on pp 282 to 283 m Bourne 
and Restle’s (1959) article, the final result being 


ilogO 

(1-0) log (1-6) 


(13) 


Applications of this model to two concept-utilization problems will now 
be considered 


3 1 Relevant redundancy 

An immediate implication of the assumption that 6 = r is that an 
mcrease in the proportion of relevant cues should speed up learning 
Such an increase can be effected by adding relevant but redundant dimen- 
sions to the problem Let R be the number of redundant relevant dimen- 
sions and assume that each contributes the same number of cues, then 
the number of relevant cues in K should be proportional to i?, that is, 
(relevant cues) =* cR, where c is a proportionality constant Let a = 
A(cues from irrelevant dimensions) and bR = ^(relevant and irrelevant 
cues from the redundant relevant dimensions), then the proportion of 

relevant cues in X IS given by 

r = (14) 

Now R, the number of redundant relevant dimensions, is known, hence 
^ arbitrarily set equal to 1, this 

constants to be estimated and makes the number 
Without ^ relevant dimension the unit of measurement 

e = 1 "The data indicate that 

in th(» f - number of irrelevant dimensions 

Uul unZ r «per,ment.l setup Taking th.s as 

Eq 14 ^ *° S° further with the application, 


( 15 ) 
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Fig 1 Predicted and observed errors in a concept utilization expcnment analyzed by 
acue conditioning model Adapted with permmioofromBoume&RcsUe (1959, p 285) 

We then use this r, which is taken as equal to 6, in Eq 13 to predict the 
number of errors for various numbers of redundant relevant (/?) and 
irrelevant (7) dimensions Figure 1 shows the predicted and the observed 
errors from the concept-utihzation expcnment descnbed m the article In 
lieu of ngorous techniques for examining theory-data discrepancies, a 
fair amount of appropriateness for (he model might be admitted, recalling, 
however, that (1) in developing Eq 15 it was assumed that each dimension 
contributes an equal number of cues and (2) the estimation procedures for 
c and a were not presented by Bourne and Resile and hence whether or 
not a further assumption was introduced is not known 

Up to this point, only the two-choice situation has been considered 
Bourne and Restle extend their model by treating the four-choice problem 
as if It were composed of two independent two-choice problems This 
means that if p,* and p,„ arc the probabilities of CR’s on the two relevant 
dimensions taken sin^y as two-choice problems, then the probability of a 
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CR m the four choice problem is given by 

( 16 ) 


In the event that the separate two choice problems are equally difficult, 
IS given by 


Pen = Pin’* = 1^1 - 


1(1 - 6 )"-* 


r + (1 - r)(i - ey 


- 2 


(17) 


In the two choice situation, it was assumed that 6 = r, in the present 
context, they assume that 6 = r/2 As in the two choice situation, an 
estimate of 0 may be obtained and predictions made about the expected 
number of errors for different numbers of irrelevant dimensions The 
only additional assumption involved in the extension is what might be 
called the independence assumption, stated formally m Eq 16 


3 2 Additivity of Irrelevant Dimensions 


On the assumption that all the dimensions of a problem are equally 
effective and using the previous estimate of c, namely, c = i, r can be 
written 


i^, 

B + I + r’ 


(18) 


where B = JV(residual or background or internal, irrelevant cues) Now 
B can reasonably be supposed to be constant in a given experimental 
situation, and R can be held fixed by the experimenter Thus, after 
appropriate algebraic manipulation, I/r is found to be a linear function 


1 2(R + B) 2 
r R ^ R 


(19) 


A value for B for this experimental situation can be estimated by first 
estimating 0, which yields an estimate of r, and then entering the values 

vl "I"!”® esfimate of 5 IS then used in predicting the 

value of IJr for experiments with dilTerent values of R and I, the numbers 
iivelv "n '’°'’'''‘l“"'iant irrelevant dimensions, rcspec- 

and fo “P'^mcnlal results involving both two- 

from thc'modcl excellent agreement with predictions 

condumnin if applications of their cue 

Again, the agreement bctuccn the model 
la IS vcr> good and highly encouraging This implies that the 
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model as a whole is appropnate for those variations of concept-utilization 
problems so far investigated 

Although Bourne and Restle do not make reference to reversal- and 
nonreversal-shift tasks, it is tempting to assert that the model straight- 
forwardly predicts that the latter should be more difficult than the former. 
This would seem to be deducible as follows In the reversal situation, the 
same cues are irrelevant as before and hence are already adapted, leaving 
only reconditioning, whereas in the nonreversal situation, the now 
relevant cues are adapted from the preshift task and the now irrelevant 
cues are inappropriately conditioned with probability | Such an argument 
cannot be made, however, because there is no provision m the model for 
the conditioning of previously adapted cues and no provision for the 
extinction of previously conditioned relevant cues The model has two 
explicit initial conditions, namely, that there are no response biases and 
that initially none of the irrelevant cues is adapted Thus Bourne and 
Restle’s model cannot be applied to reversal- and nonreversal shift 
problems in its present state 

Two final comments seem worthwhile The first has to do with the 
model’s inability to handle the variance observed m the data once 0 has 
been estimated, the probability of a CR on trial n,p„, is determined Thus 
every subject is required to have p„ as his response probability, and devia- 
tions from Pn can only be random deviations with variance as prescribed 
fay the model The fact that the mean number of errors, averaged over a 
group of subjects, was used m estimating 6 does not mean that p„ is an 
average In a subsequent model proposed by Restle (1962), treated in the 
next section, this defect is eliminated by making learning itself a random 
event, a feature that allows the model to generate variability comparable 
with observed variability 

The second comment is a common sense observation that only prescient 
subjects can adapt i?percenf of the cues snd comftlton & pereaxt 

of the unadapted cues on the first trial, or even on the second or third 
trials It seems unrealistic to assume, as is done implicitly m the con- 
ditioning model, that subjects can “know” which cues arc relevant and 
irrelevant with sufficient certainty prior to complete solution to ensure 
that a full 0 per cent of each type arc appropriately conditioned or adapted 

4 CONCEPT UTILIZATION AS STRATEGY 

SELECTION 

A strategy is a rule according to which decisions are made In a two- 
choice concept utilization task, a subject may adopt the strateg) of making 
response /?t or R, according to whether the stimulus is red or green 
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respectively If color is the relevant dimension, this strategy is either 
correct or wrong correct if and have been designated by the experi- 
menter as appropriate to red and green, respectively, wrong if they have 
been designated the other way around If some other dimension is relevant 
and color is irrelevant, then this strategy is irrelevant 
A strategy need not be related to the stimuli presented , for example, a 
subject may decide to double-alternate the two possible responses, i?i, Ri, 
■^ 2 » -^ 2 . Such a strategy is irrelevant In a concept-ultilization task 
with one relevant binary dimension, a rule for responding (a strategy) can 
be labeled formally as follows (1) relevant if it is either correct or wrong, 
where it is (a) correct if it leads to a CR with probability 1 and (b) wrong 
if It leads to a CR with probability 0, or (2) irrelevant if it leads to a CR 
with some probability other than I and 0 
Strategies must specify under what conditions (or according to what 
plan) the alternative responses are to be made, that is, a strategy cannot 
be a rule for only a segment of the over-all task For example, the strategy 
“make R whenever the stimulus is red” is not permissible because the 
* j action for occasions when the stimulus is not 

red (Of course, if all stimuli are red, then it is implicit that R,,; ^ i, is 
never to be made, thus making the strategy acceptable, and irrelevant ) 
permissible strategy would be “make i?, whenever the stimulus is red, 

0 erwise make Rj This simple strate^ thus specifies a course of action 
lor all possible stimulus situations 


At the risk of belaboring the issue, a final example is presented Consider 
a concept utilization task with two binary dimensions form, with values 
circle and square, and color, with values green and red Let color be the 
relevant dimension such that “green --R, and red^R,” is the correct 
strategy Suppose that the subject selects the strategy “green square - R, 

r ” Because this strategy leads to a CR with 
probability }, it is labeled irrelevant 

‘h “ mathemahcal modal based on strategy 

m mndL ‘l'“>5«bjoctsm a ooncept-ut.bzat.on task select 

?h, smatl ,'? The subject retains 

s assulT If ■> l«ds to an error/the subjeet 

sub M sll Jts " “ 

he s^cts "-‘h probabd.ty 1 . tf 

h= seISs an .rrT w.th probabd.ty I .f 

d.lTereat from 0 an™"l '“of 'e^uree "i? h" “ 

retains It and course, if he selects a correct strategy, he 

an error is made ” onothcr error, since rcselcction only occurs when 


y assit.ablc into component axioms The fundamcnial 
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entities about which the model is concerned are strategies, not stimuli 
or cues, and hence there is no stimulus axiom, and since strategies are 
selected, not conditioned, there is no conditioning component to his 
model In other words, the fundamental entities, strategies, do not pass 
from an initial state to some sort of conditioned state Finally, because 
the selection of a strategy immediately implies a specific response (a 
response associated with a strategy is part of the definition of that strate^), 
a distinct response model is not apparent In view of these difficulties, 
another organizational approach must be used 

STIMULUS SITUATION Restle's Strategy Selection model applies to 
stimulus situations about which a subject can form testable hypotheses 
regarding how to respond correctly, each response being designated correct 
or wrong by the experimenter 

FUNDAMENTAL TERM The fundamental term is strategy A strategy 
IS a rule according to which the subject decides how to respond given any 
stimulus a strategy yields a consistent pattern of responses to the stimuli 
of the task situation 

CLASSIFICATION OF STRATEGIES Let H be the set of strategies 
involving the permissible responses in the task Then, according to vdiich 
responses the experimenter labels correct or ^rong, H is partitioned into 
three subsets 

C {strategies hhtch always lead to a CR), 

ly ss {strategies which always lead to a wrong response), 

I = {strategies which can lead to either type response) 

The strategies falling into C, W, and I are uniquely determined by the 
experimenter's labeling rules, a change in his rules (for example, in reversal- 
and nonreversal-shift problems) induces a new partition on H 
ASSUMPTION The subject selects at random a strategy from H, if the 
response is labeled an error by the experimenter, the subject again selects at 
random from H, sampling with replacement, if the response is labeled as 
correct, he retains the strategy 

This assumption has two essential parts (I) strategy selection is random 
With replacement, and (2) selection is made only after an error, except on 
the first tnal The first part, random selection with replacement, means 
that the proportions of C, IV, and /strategies in H remain constant over 
tnals, in other words, evciy time an error is made, the subject is conceived 
of as returned to an initial state — the concepNidenlification task is reset, 
so to speak, to zero No provision is made for a change in the sampling 
probability of the strategy just rejected the subject has the same proba- 
bility of redrawing that strategy as he had of drawing it initially on the 
first tnal Thus, although certain roathemaUcal complexities (which would 
follow were the sampling probabilities of previously sampled strategies 
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allowed to vary over trials) are avoided, the subject is theoretically con- 
ceived of as not remembering previously selected and rejected strategies. 
The second part, selection only after an error, means, by definition of C, 
that once the subject selects a strategy from C, he makes no more errors. 
The occurrence of an error therefore necessarily requires that the strategy 
selected not be a C strategy On the other hand, the occurrence of a CR 
does not necessarily mean that the strategy selected is a C strategy since 
utilization of an I strategy may also lead to a CR. For this reason, deriva- 
tions within the model are generally m terms of errors. 

The model can be expressed as two equations. Let c, w, and i represent 
the proportions of C, W, and / strategies, respectively, in H Because C, 
W, and 7 partition 77, Let pj be the probability that an 

error occurs for the first time on the yth trial Note that given an error on 
trial n for any n,pj is the probability that the next error will occur on trial 
« + 7 This is because following an error the strategy that led to the error 
IS returned to 77 and the subject selects a new strategy at random, which 
IS precisely the situation the subject was m on trial 1 For the two-choice 
concept-utilization task, the two equations are 


and 


Pi — w + J/, 

Ps ~ Ki)', fory ^ 2 


( 20 ) 


The first equation follows immediately from the experimenter’s labeling 
procedures and from the definition of irrelevancy, for the probability 
of an e^or on the first trial (or on the first trial after an error) is the 
probability of drawing a W strategy plus the probability of drawing an 7 
strate^ and its not being labeled a CR, which is assumed by Restle to be 
i in the two-choice situation (The implications of this last assumption 
1 1 be discussed kter ) The second equation can be evolved recursively as 
foUows Let PrCCR, | CR,_, and 7,_,) read “the probability of a CR on 
Then ^ - 1 and an irrelevant strategy on trial i - I ” 


p2 - PrCCRi I 7i)Pr(7i)7’r(error2 | 7) = ^,(^) = 

P3 = P/'(CR, I CRi and 7,)Pr(CRi | 7i)7*r(7,)Fr(error3 I 7) 

— (i)(i)<J) = 


Pi i’KCRj.i I CR,^ and /,)Pr(CRi^ 




.j-2 [ 

X PlipTTOTf } 7) 
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Thus Eq 20 expresses the assumptions of the model plus the assumption 
that Pr(CR ) /) = J in the two-choiC5e situation, and hence descnbes 
mathematically the concept-utihzation process hypothesized by Restle 
A comment concerning the assumption that Pr(CR | /) = | in the 
two-choice situation is in order According to this assumption, either 
(1) exactly J of the I strategies are wrong and all I strategies have the same 
samphng probability, or (2) some fraction other than ^ are wrong and the 
distribution of sampling probabilities is not uniform A similar statement 
can be made for the situation involving r response alternatives where the 
assumption that >V(CR j /) = i/r would presumably be made by Restle 
It seems difficult to make psychological sense out of this restriction in 
view of the definition of a strategy Because strategies are rules according 
to which decisions are made, it seems reasonable that the value ofPrfCR J f) 
should depend on the stimulus situation as well as the number of response 
alternatives The equality Rr(CR !/)=*! for the two-choice problem is 
used in a number of denvations of new statements within the model and 
hence must be considered as one of the axioms of the model 

4 1 Expected Number of Errors 

As an illustration of how new statements may be denved from the 
assumptions, consider the model’s prediction of the number of errors 
(selection of IK or / strategies) a subject will make before selecting a C 
strategy Since c is the proportion of C strategies in If, c is the probability 
of no further errors following an error The probability of exactly one 
more error, then, is (I — c)c, the probability 1 — c = w -i- / of first 
drawing either a fV or/ strategy times the probability c of then drawing a 
C strategy. In general, the probability of exactly errors is (1 — c)*c 
Therefore the expected value of Ar is the expected number of errors 

£(k) = f -cA 

= r(I -c)f 

Jtol 

= c(I-c)i 


1 - c 


( 21 ) 
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Thus the model predicts that a subject will make on the average (1 — c)/c 
errors during the course of learmng to utilize the correct concept 
Note that m order to predict the number of errors in an actual experi- 
mental situation, the parameter c must be known Since other statements 
involving c can also be derived within the model, the procedure for 
obtaining a numerical value for c is to expend one of them in estimation 
Suppose that of a number of derived statements, Eq 21 is chosen to 
estimate c Let T denote the mean (over a number of subjects from a 
homogeneous population) number of errors actually made in the experi- 
ment, now substitute T for E{k) m Eq 21 and solve for c Thus the 
estimate of c is given by 


c 


1 

T-h 1 


( 22 ) 


It should be noted that S slightly overestimates c This is because 
T ^ E{k), which IS due to the fact that in an actual experiment only a 
finite number of trials is given In other words, E(k) is influenced by 
extreme outcomes like prolonged correct responding under an / strategy, 
whereas r IS not 

The estimate of c given m Eq 22 is also the maximum likelihood esti- 
mator for c Suppose a group of N subjects makes X errors during the 
concept-utilization task The probability, or likelihood, of such an event 
when c is the proportion of C strategics m H is 

L(X errors, c) = (1 — c)^c^, 

the probability of X errors during the presolution period The question is, 
what value of c maximizes this likelihood? Since the function logi^ has 
its maximum at the same value of c as does the foregoing hkehhood 
IS first rewritten as 


log L = log [(1 - c)'c«l = X log (1 - C) + N log c 
Proceeding with maximization as usual, 



which, when set equal to zero and solved for e, gives 

L_ 

A + W {X/N) + 1 • 

Howocr, since X is the number of errors made by subjects. is the 
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mean number of errors made, or T Therefore 
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the maximum likelihood estimator for c 
It IS of interest that the model, as formalized in Eq 20, also permits a 
maximum likelihood estimate of w, the proportion of If' strategies m H 
To obtain the likelihood function, begin by noting that before a C 
strategy is selected and no more errors are made, the following conditional 
probabilities are immediate 

i*r(error„+i j error„) = w -f |i = I — g + w 
2 

i>r(CR„. I errorj = j, = 

The former IS simply the first equation of Eq 20 Regarding the latter, m 
order to make a CR in the presoJution period, that is, pnor to selecting a 
C strategy, the subject must draw an / strategy, which then leads to a CR 
with probability | Now if T + I is the total number of errors, counting 
"trial 0 ’ as an error, then T + I = 3/i + Mo, where Mi is the number of 
errors followed by another error and Mo is the number followed by a CR 
For a given subject, then, one can speak of F -f I errors, Mi of which are 
followed by another error and Mo of which are followed by a CR Clearly, 

T -f 1, Ml, and Mo can be counted in the subject’s protocol By using the 
preceding conditional probabihties, the likelihood of such an outcome for 
the given subject can be written as 

L(T + 1 errors. Mi followed by another error, Mo by a CR) 

= lPr(error J eiTor)]^''[Er(CR J error)]^«"*c 

„ - c 4- w y^ ^l —c— w 


where c is the probability that no more errors are made The reason for 
making (he exponent of the second factor A/o *” 1 that a CR on the first 
trial, that is, a CR following the tnal 0 error, is not to be counted since the 
number of errors T counted from the subject’s protocol docs not include 
trial 0, thus Ml + (A/® — 1) = T, as required The likelihood of a 
whole set of such data for iV subjects is 


= ni. = 
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If this likelihood function is maximized simultaneously with respect to 
w and c, it turns out that _ _ 

I 

r+ 1 

Thus both c and w are capable of being estimated This means that 
because i = 1 — c — w, and hence 1 = 1 — c — w, the structure of H 
can be estimated in concept-utilization problems for which the model is 
appropriate 


4 2 Additivity of Cues 

Consider a concept-utilization task involving at least two binary di- 
mensions Di and Suppose there are three experimental groups such 
that for Group 1 only D-i is relevant, for Group 2 only D^, and for Group 
3 and are both relevant but redundant, in each case all the remain- 
ing dimensions are irrevelant Thus all three groups have a two choice 
problem If Cj and Cj are the proportions of C strategies m H for Groups 
1 and 2, respectively, then Restle asserts that Cj = Cj + Cg should be the 
proportion of C strategies in H for Group 3 This equality is not presented 
as a theorem within the model, but merely as what Restle (1962) calls 
. the simplest interpretation of the experiment” (p 340) It is of 
importance to notice, however, that the assertion Cg = Cj + Cg involves a 
shift from speaking about cues (values on dimensions) to speaking about 
strategies The model provides no formal bridge by means of which 
manipulations of the stimulus situation (making and redundant) 

canbe translated into manipulations of subsets of //(makingc, = -f Cg) 

e p ausibility of adding subsets of m a manner corresponding to 
cue addition arises from the intuition that if strategies involving hypotheses 
L """ irrelevant, then if the cues become relevant so 

should the associated strategies What is intended could, of course, be 

statedcompletelyintermsofstrategiesandtheircombmations Thiswould 

not be convenient, however, because such a formulation would make no 
direct reference to the stimulus situation and thus would make the 
A^nu ♦ "" cue-additivity problem very difficult within the model 

sitint»nn’ ^**®r”P* to apply the model to a variant stimulus 

situation that we realize there is very little contact between the model and 

mto L*" (*hc set If and its structural properties) 

into the real being the error equations 

cl(«elv^^i!!!r'^^^’ ^3ta from several studies in which c, is 

closely approximated by c, + c. In one of these, a rcsponsc-vcrsus-phcc 
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learning study by Scharlock (1955), rats were run on an elevated T>maze 
Group 1 had to learn to make a particular turn regardless of which goalbox 
had a light over it, Group 2 had to learn to run to the goalbox with the 
light over it regardless of whether it was to the right or left, and Group 3 
had to learn to run to a lighted goalbox which was always on the same side 
Group I subjects were thus required to utilize turning behavior in order to 
solve the problem, Group 2 subjects, light-selecting behavior, and Group 
3 subjects, either or both since for them place and response cues were 
redundant In 28 trials, the mean number of errors made by the subjects 
of Groups I and 2 were 6 7 and 9 7 respectively If these values are 
substituted into Eq 22, the estimates of c for the two groups are = 
1/7 7 = 0 13 and Ca = I/IO 7 = 0 09 Then da = 0 13 + 0 09 = 0 22, 
and hence for Group 3 the expected number of errors as given by Eq 
21 is £(k) = 0 78/0 22 = 3 54 The mean number of errors actually 
observed for Group 3 was 4 01, “ which is adequately close” (Restle, 
1962, p 340) 

In a discrimination problem for cats, Warren (1959) reports 11 13 errors 
for position relevant, 28 63 for object relevant, and 8 00 for object and 
position relevant and redundant Estimates of Cj and Cg are calculated 
from the first two values Ci = I/I2 13 *= 0 082 and dj ** 1/29 63 0 034 

Then da = 0 082 + 0 034 » 0 116, and hence by Eq 21 it must be that 
E{k) 0 884/0 1 1 6 s= 7 62, which is to be compared with the observed 
8 00 As will be seen in Sec 5, where the Bower and Trahasso (1963) 
model IS discussed, prediction of the cue-additivity data of the foregoing 
experiments can be considerably improved 


4 3 A Modification 

The seeming anachronism notwithstanding, an interesting modification 
of the present model (Restle, 1961, 1962) was proposed in 1960 by Restle 
which involves a specific probability r that rcselecting will follow an error. 
The new assumption is the sutyect selects at random a strategy from H, 
if the response is labeled an error by the expenmenter, with probability r 
the subject again selects (sampling wnth replacement) at random from H, 
and with probability 1 — r he retains the non Cstrategy, ifthe response is 
labeled correct, he retains the strategy with probability 1, as before The 
original model is obtained by setting r 1 
This modification was intended to handle the possibility that the subject 
might perseverate, that is, that he might slick with a strategy even though 
it led to an error The probability that perseveration occurs on an error 
tnal IS 1 — r, perseveration on noneiror tnals is assumed to occur with 
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probability 1 both here and in the onginal model since a strategy that 
leads to a CR is always retained The term “perseveration” is therefore 
used only in reference to retaining non-C strategies If no perseveration 
exists, an estimate of r will be approximately 1 
In this modified model, the probability that a given error is the last is rc, 
the probability r that the subject chooses a new hypothesis times the 
probability c that it is correct Proceeding m a fashion similar to the 
onginal development, Iet/(A:) be the probability of exactly k errors Then 

m = c 

/(I) = (1 — c)rc 
/(2) = (I - c)(l - rc)rc 
/(3) = (1 — _ rcYrc 


f{k) = (1 - c)(I - rc)*-Vc, for A: > 0 (23) 

This distribution IS similar to the geometric distribution, which was the 
distribution of the expected number of errors m the original model, and 
a similar argument shows it to have the mean 


£((:) = 


1 — c 


(24) 


The model permits maximum likelihood estimation of r and c Consider 

of ^ together make X errors, N„ 

e zero errors The likelihood of such an event is given by 

L = — c){l — rc)‘“'(rc)]^“^®, 

stmteU'’of m probability of N, subjects selecting a C 

N-N th' •'to probability of 

^biec^Lakl .VrrT'"®’ "’“'“"8 ^ "rors Now if 

subjects make all JTof the errors, then t must be approximately XKN - JV„) 

If this substitution IS made, the foregoing likelihood f^nctlTomeT 
L = r'‘[(l - cXI - 
= c'Xl - - rc) 

Maximizing L with respect to c and r simultaneously gives 


and 


-lUs 

N 
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Thus Since r and c can be estimated and because a number of statements, 
such as Eq 24, can be derived within the model, an evaluation of the 
perseveration hypothesis is possible in concept-utilization situations for 
which the strategy-selection model is appropriate Restle (1960) cites 
several studies for which this modified model seems appropriate For 
example, m an application of the model to a discrimination experiment with 
infant monkeys reported by Harlow (1959), Restle estimated r to be 0 50 
and c was found to increase with age This analysis suggests that with an 
increase m age there is a concomitant increase in the proportion of C 
strategies in If but not a change in the willingness to give up a disconfirmed 
strategy— an interpretation entirely consonant with Harlow’s (J960) own 
views 

In summary, the strategy-selection model is concerned with entities, 
strategies, which have no referents in the stimulus situation This leads to 
difficulties when attempting to make applications involving variations in 
the stimulus field, as in cue-additivity experiments When such applica- 
tions are attempted, the arguments leading to predictions can no longer be 
made completely within the model, and hence the resulting comparisons 
between theory and data are not strictly legitimate 

5 CONCEPT UTILIZATION AS SELECTION 
AND CONDITIONING 

Consider a concept-utilization task where there are m independent 
binary dimensions under the expenmenter’s control, then there are 2m 
distinct cues, 2 from each of the m dimensions, and 2’” distinct categones 
to which the stimuli of the task can be assigned Thus, in the ordinary 
situation, each stimulus has m atlnbutes, or, more precisely, carries m 
cues For example, let m = 3, where the three dimensions are color (red, 
green), form (circle, square), and size (large, small), then one of the stimuli 
IS a large red circle and thus carries the iw = 3 cues of largeness, redness, 
and circularity If the relevant dimension is color and if the t\\ o response 
alternatives are Rj and /?, such that red-* 2?, and green are the 
associations to be acquired for correct utilization of the concept color, 
then when the large red circle is presented, the subject must learn to select 
redness from among largeness, redness, and circularity and toassoaaie 
With It, that is, he must categorize the stimulus as red (as opposed toother 
possible categonzations) and assoaate R, with that categor) 

Hypothesizing that this is the nay in which subjects actually proceed 
when faced with a concept-utilization task. Boner and Trabasso (1964) 
constructed a model that formalizes the selection and association processes. 
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In their model, categories are sampled according to their “attention-getting” 
properties and the subject shifts from an unconditioned state, where the 
response alternatives are chosen at random, to a conditioned state, where 
the correct response alternativeiscompletelyconditioned to the appropriate 
category Thus, in the preceding example, the set of all stimuli that are 
red IS the category with which must be associated The shift from not 
associated to associated is probabilistic and contingent on the selection 
of the appropriate category The component axioms may be stated as 
follows 

Stimulus axioms 


(i) The stimuh can be classified into distinct categories such that each 
stimulus belongs to more than one category but not to all categories 

(II) The set of categories is partitioned according to the experimenter's 
response labeling procedures into relevant and irrelevant categories 

(III) The probability of selecting a relevant category from among all 
the categories to which the presented stimulus belongs is 


ten ^ 

u/icre the s are the weights or measures ("attentiorngetting" lalues) 

of ifiereleiantiR) and irrelevant (!) categories 

U, an imita! 

mcandmoned MIC, or C. a terminal cond, Honed slate 

correr^rn^^o ^ trial on nlncli the subject has selected the 

IS conditioned'^’ a"'' 0 that the correct response 

It remains so ° ‘^‘tlegoiy, once the correct response is conditioned, 

(ui) Initial condition all subjects begin in U 
Response axioms 


chance nrobaUo'' ' '"r^’ °f a correct response is some 

rcsp7/e ls\ ^ "■‘^Probabihtjo/a correct 

cnt.l,cd. nimcly, the cond.I.on.ng of a response to tts appropriate 
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category This is a compound event with probability c = rd, ‘he joint 
probabdity that the relevant aspect of the presented stimulus is selected 
Ld that the correct response becomes conditioned to it Si^e the 
parameter c is estimated from the data, and because it is the " 

Ld B, It should be pointed out that there arc at leas ^ 

of estimation error. First, r is variable across subjec s to * 

perceptual identification skills play a role incategory selection, 

6 varies to the extent that subjects differ m responding ““■‘y Jlie ^ 

does not allow for such differences, in fact. ‘‘PP'f 

and IS evoked in connection with group data only 

Identical subjects Therefore the appropriateness of 

concept Identification situation is directly related to ‘h= ™ f 

pereeiLg and responding skills are at the same “sy-ptote for a ubje ‘s 

A subject IS seen by the model as making two responses only (1) selecting 

a category and (2) a ‘■«P“"® ^ of moving from U. the 

Each subject on each trial has probability c ® 

initial unconditioned state, to C, the termina c i.gAij one element 
this way. the model is formally -ien.tca with B-er ^ 0 ^ ™ ,, 

model, in other words, both are two state Markov chamscna 
by initial and terminal probabilities P ,s ,n l/f that is, over the 

state Markovian models is that as long a J responses should 

trials prior to shifting into C, the the probability 

followthebinomialdistributionwit param^^J^,^^^^ 

that R, will be selected over ,,eh trial prior to sh.ffmg 

mrSraTmderentn^^ 

alternatives are P°®*‘“=- ® mols are mutually independent 

are fixed over trials, and (3) the nower and Trabasso report very 

are called independent Bernoulli mo» concept utilization task they 

impressive data demonstrating 1 a , i , Bernoulli trials This 

use, the trials prior to „ regarding the appropriateness 

condition is an important go ahead g S ^n^pic of such data is 

of the model for the particular situation An ex p 

shown m Fig 2 ...orehoicc single classillcalion 

Over 200 subjecU were trained combinalions of relcrani alln- 

Dioblems Different subjects had diUOTng ^ ,mmalerial lo drading 
butes, this affects the learning “ P subject's final error . . . Ba'b POint 

whether performance IS conslant prior to ri j j on some laler 

biS (Bower & Trabasso. 1961. pp 36-37) 
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Blochs of five trials 

Fig 2 Percent successes prior to the last error plolttd m blocks of five trials Adapted 
with permission from Bower & Trabasso (1963 p 36) 

It IS relevant at this point to mention the findings of Suppes and Ginsberg 
(1963) They report evidence to support the possibility that the kind of 
stationarity of response probability prior to the last error shown in Fig 2 
may be an artifact of the method of organizing the data It may be that 
response probability increases incrementally, so that on any trial the mean 
response probability is actually a composite of high and low response 
probabilities, high for subjects about to shift from U to C, low for subjects 
who will not be shifting from t/ to C until somewhat later m the problem 
If such IS, in fact, so, then Vmccntizing the presolution data should show 
an increase in response probability instead of stationarity Suppes and 
Ginsberg performed this analysis on a number of sets of data that have the 
properly of stationarity when reported in the manner of Fig 2 They 
found systematic increases in response probability as the critical event 
(solution) w as approached Thuscauuon is m order when model evaluation 
is in terms of prcsolution stationarity of response probability 
The state to state transition probabilities of the present model can be 
isrittcn in matrix form 


C/ C 



Thus, for example, on any trial the probability of remamins m slate U 
u U^' the probability of shifting to C is c. Note that C is 

absorbing The nth pow er of P gives the probabilities of mo\ mg from one 
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State to another m exactly n trials 

U C 

i-a-cr 
c[ 0 J 

To apply the model to a particular concept utilization situation requires 
an estimate of the parameter c Although parameters r and d are used in 
deriving a number of statements from the assumptions, they seldom appear 
separately in the final form of the statement, r and 0 nearly always occur 
m statements jointly as rfl = c Estimation of c is rather straightforward 
and does not involve either r or t) In Sec 2, where Suppes and Ginsberg’s 
application of Bower’s one-element model was treated, it was shown that 
if the probability of moving from Uto C were a constant c for each tnal, 
then the expected number of trials before the shift occurs is I/c In other 
words, 1/c trials should be required before the critical event occurs 
However, for each of these trials, there is a constant probability p that the 
subject makes a correct response simply by guessing Therefore, of the 
1/c trials prior to conditioning, of them should be correct responses 
and (1 — p)lc should be errors Now p can easily be estimated since 
learning is assumed to be all or-none, performance prior to the critical 
event should be at a chance level, hence p can be taken as the observed 
proportion of successes prior to the last error, denoted p From within 
the model, then, the expected number of errors is 

£(errors) = ^ ^ (26) 

c 

From the data, we estimate p and find, over subjects, the average number 
of errors T Substituting T for fferrors), we have as the estimate of c. 



Equation 27 can also be derived by means of the maximum likelihood 
technique In a given concept utilization experiment, the protocol of the 
ith subject consists of a sequence of errors and CR’s Prior to the critical 
event, suppose subject / makes T, errors and Z/ CR’s The likelihood of 
this outcome is 


L, = p^'(I - 

where p^* is the probabilitj of the Z, CR’s, (I — pY"* the probabiht) of 
the TV errors, (I — the probability of being in UforaUZi + TV 

trials except the last, and c the probability of shifting into C on the last 
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The likelihood of N subjects produang protocols with T errors and Z 
presolution CR’s is 

L = n i-i = j>^(i - pf(i - 

i=i 

where 

2f N 

Z=2Z, and 

» 1 i=i 

Converting L to log L gives 

log Z. = Z log/> + rlog (1 - /)) + (Z + T - iV) log (1 - c) + A^log c 
Maximizing with respect to p and c simultaneously, 


3logL _Z ?L = ( 

3p p 1 — p 
3logL _JV z + T-N 
Sc c 1 — c 


The former yields p — Z/(Z + T) the proportion of presolution trials 
which are CR’s, whereas the latter yields c = NI(Z + T), which can be 
rewritten as 


T+(Z-Z) 
(T/W)(Z + T) 


as given m Eq 27 


1 - Z/(Z + T) 
T 

= Ll£ 

T ' 


5 1 Additivity of Categories 


From Eq 25, it is clear that r, and therefore c, must vary directly with 
the number and weights of relevant categories (cues) and inversely with 
the number and weights of irrelevant categories (cues) Specifically, c is 


2", 

ten 


2 «. + 2 «, 

lelt !«/ 


( 28 ) 
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By using Eq 28/ statements regarding the effects on performance of 
certain vanations m the stimulus situation can be derived within the model 
Consider the ordinary two choice cue-additivity situation as described 
m Sec 4 2 Let Ti, Tz, and be the total number of errors made in the 
conditions where and plus are relevant, respectively The 

model predicts that 

t,+ t. 


The proof of this equation is facilitated by ffrst proving a lemma 
Lemma if Cj, Cj, and Cg are the learning rates for the conditions where 
T>i, Dz, and D^plus Dz ore relevant, respectively, then Cg = Cj + Cg 
PROOF From the definition of r as given in Eq 25 or 28, 




"'i + fz + Wa 


Cz^O- 




Wj + Wg + Wg 

Jh 


Wi + Wa + Ws ‘Vi + iVj + w, ■ »v, + Hg + iVj 


» Cl + Cg, 


as required 

Note that although the weights were used in the foregoing proof, they 
do not appear in the lemma Thus, although there are (complicated) ways 
of determining category weights, il is seen that their formal properties 
may be utilized m derivations within the model without the need of 
determining their actual values arising Using the lemma, we may now 
prove the following theorem 

Theorem IfTi, Tz, and T^are the total number of errors for the conditions 
where D,, Dz and Di plus Dz are teleiant, respecuicly, then 




(29) 


PROOF From Eq 27, c = (I — p)}T, or, for a particular condition, 
Cj = (I — ■ p')JTi Therefore TV = (1 — />)/e„ and in particular T* = 

— p)!cz However, from the lemma, e, «= Ci + <** Therefore, by 

* Equation 28 as wxU as Eq 25, is properly a definition The assumption is ihai 

category selection is probabilistic, as gw'cn in ihe stimulus asioms Equation 2S k then 
offered as one of the possible w-ays for interpreting Ihe assumed probabihfv 
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The likelihood of N subjects producing protocols with T errors and Z 
presolution CR’s is 

L = n - pf (1 - V, 

t=i 

where 

N N 

Z-J^Zf and T='2T, 

1=1 »=i 

Converting L to log L gives 


log£ = Zlog;i + TlogCl - p) + (Z + T - N)\og{\ - c) + N\ogc 
Maximizing with respect to p and c simultaneously, 

BlogL Z 7- 

dp p 1 _ p 

aiogL W z + T-N . 

Sc c l-c~ 


The former yields ^ — Z/(Z + r), the proportion of presolution trials 
which are CR’s, whereas the latter yields d = NI(Z + T), which can be 
rewntten as 

r+(z-z) 

(T/W)(Z + T) 

1 - Z/(Z + T) 

T 


as given in Eq 27 



5 I Additivity of Categories 

From Eq 25, it is clear that r, and therefore c, must vary directly with 
c num cr and weights of relevant categories (cues) and inversely with 
me number and weights of irrelevant catcgoncs (cues) Specifically, c is 
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By using Eq 28,= statements regarding the effects on perfomance of 

StrvLalonsmthestimulus situation can be denved^^^ 
consider the o dinary^^^^ 

Tontoions pins i,, are relevant, respectively The 

model predicts that 

Ti% 


r* = 


T,+ Tg 




Cl 

C2~^ 


C 3 — 0 


Wl + Wg 5 - ( 
Wi + W 2 + >^3 


W| + Wi + IV3 
Wi + W 2 + 

!h . + 0 




Wi + W 2 + Ws 


+ Wj + VV 3 


» Cl + Cff 


do not appear in the lemma seen that their formal properties 

of determining category weights, 1 without the need of 

may be utilized in derivations within the mod „„„ 

determining their actual values arising Using 

prove the following ‘•’^oj™ number of errors for the combi ions 

Theorem IfT,. or releianl. respeCnel^. .hen 

nhere £)„ Dt, and D, plus Vt 0 


r,= 


r,r, 
T, + T, 


(29) 


0-1 .-a -p)IT. or. for a particular condition, 
•ROOF From Eq 27, c ^ „)fc , and m particular 7i 

^!ll.‘)/r"H^e"m f^-llmma. . = C + C Therefore, by 

■Equalion ZS. a, 

laKgory selection is f,® inteiprelms ihe assumed riobabiluy 

jfTcred as one of ihe possiol } 
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substitution, 


T,= - 


Cl + Cj 


( 1 - 


■ pm + (1 • 

1 - p 


■ p)n 


' [(1 - pXTj + TOJ/TiTj 

t,t. 


A T,+ T^’ 

as required 

Note that Eq 29 does not involve any of the parameters of the model 
nor any of the category weights used in the definition of r All these are 
used in the proof, but the final statement involves only the total number of 
errors made under the three conditions 
Bower and Trabasso cite data which give outstanding support to their 
model In a T-maze task for rats, Scharlock (1955) reports T, = 9 7 for 
p ace learning, Tj = 6 7 for response learning, and T, = 4 0 for place 
plus response learning The last value is to be compared with 3 97, the 
value predicted by the model using Eq 29 In a discrimination problem 
« ^> = >> >3 for position relevant, T, = 

^ A •'fe' = 8 00 for object plus position relevant 

of 8 O3^for^7' ^ values into Eq 29 yields the model’s prediction 

In summaiy, Eq 29 is a statement, derived from the assumptions of the 
model, which IS true in several category additivity situations This 

situations^ m cogent for such concept-utilization 

011^0 T sbmulus axioms are true of the stimulus situ- 

ex sis ihn. O^xercised, however, since the possibility 

for nitons T between model and data is because of a 

vT Iv Th '^hich are true taken indi- 

a 1 loHels . 1 on Ihis model in particular, but rather on 

and Cem r ““"’■"“'■on of the foregoing lemma 

men bv ro “«“'nP‘'ons, plus the delinitmn of r 

g»Ncn by Eq 25, were used in their proofs 


5 2 Single-Category Solutions 


ti.alTnd‘‘ronorn'.L™“'“'‘''.’‘'‘’"’'’‘''”'''’J'e< ^electsa categoiy on each 
trial and follow mg the eritieal eienl. he always makes the eorreeZresponse 
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for that category What if, as m the preceding section, there are two or 
more relevant but redundant categones for each stimulus t Since on any 
given trial only one of them is selected, it follows that on the cntical event 
trial the correct response becomes conditioned to only one category 
Furthermore, because the category to which the correct response is 
conditioned is the only category selected thereafter, the correct response 
cannot become conditioned to any of the additional redundant categones 
Thus in a situation where there are two relevant redundant »=s for 
each stimulus, the subject is seen as solving the problem using only one of 

'"'sunnose D, and are relevant and redundant, and suppose that on 
soS the coLept-ntihzation problem a new problem is introduced 
wtre®f,!::re“ev:n‘;andf,.doesltappear Then, if the mode is con^ 
the subjects should be neatly divided into two poups ^ 

in solving the first problem should show perfect transfer to the secon 

1963) was designed to test this prediction, and may be introduced 

proportion J ^ ortion + "n.) 

Utilization task using only P proportions are the proba- 

solve using only Dz This is beca .--nt-tion onor to solution, Di 
bihties that, for any given st.mute be estimated 

and fJj, respectively, will be selec , respectively, arc 

from data based on ''J *'^'^d„ral de'ta.Is) Thus if and 

relevant (see Trabasso, 1963, P ^ second task Di is 

are relevant and redundant t^at proportion /(»/>. + 

relevant and Dz «s absent, it second task and that proportion 

the subjects will make no errors i learning rate Using 

Un./0.i + »n.) 'Vl> 4 72 out or 20 subjects soiled .he 

this argument, Trabasso no errors m the second task 

first task using Di and hence w parameter Cj as estimate 

Five such subjects were * p (a single problem with />i rclcsant 

from the appropriate control situation ( ^ I 5 subjects wo^d 

and if, abre'n,)%basso pre |c.ed ,,, The 

make, on the average. 16 0 / uri"' 

obsemed mean was 16 45 . „„,y solution imphcanon is l at on« 

A second aspect of .he “,bc cn.ical eienl has occumed 

a subject attains the conccpl. lhal ' , 3 „„ns of Ihe " 

responding should be ( 1 ) lo irrelevan. cues presen.cd ,n 

categories and ( 2 ) at the c n 
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isolation Bower and Trabasso report experimental results which suggest 
that the latter might be true They used trigrams as stimuli which could 
be categorized two ways (J?i or according to whether the middle letter 
was an T' or a y After teaching a criterion on this task, the subjects were 
then tested on the various letters separately, on the pairs which could be 
formed from the various letters, and on the trigrams themselves The 
probability of a CR when a ttigram was tested was 0 92, when either F 
or Y was tested alone, 0 89, and when any pair which contained F oi Y 
was tested, 0 88 The two response alternatives JJ, and occurred about 
equally often when irrelevant letters were presented, either singly or in 
pairs, thus confirming the prediction that responding should be at the 
chance level to irrelevant cues presented in isolation 


5 3 The All-or-None Assumption 

Inals prior to the critical event and the 
tinn tn 't"i of the terminal state C together constitute an assump- 

uTearneVT?'"® " f ^ ® “ 

TssuZ, T “y particular category-response pair This 

«oeZem “'e condiUoning model and is open to direct 

Beeare r ^nvations within the model 

fs C the cr?' 1' '"^“ae that the subject is still in state U. that 

exoermeme ' "““"red, it must be that a change in the 

errrc^rtav immediately following an 

Three eroiinr ‘ration. Bower and Trabasso proceeded as follows 
Where ^ concept-utihzation task 

respective'lv relevant^^'^a binary dimensions which were, 

utaefts o/oror '™ =>f‘rr trial 9, the 

were given a no^m^ were given a reversal shift task, those of Group NR 

on the original task il, other C were continued 

rrnrskt^uK;"'’^ fo-CatueVm 

i?i and Z) -*./? fnr r* **„_ *^"^ for Group Z? and -► 

suceessespemr m mat o ’’ ^ ’’'S'nmng a criterion run of 

nothing hL ree^IeTrLd uTtoZ’T'' 

then, m terms of nerformn ^ r ii^ which the shift was made, 

shift, the three croim^ trance following that error, that is, following the 

orro; tna,'^' fnraftdZe 

mcan'Sriarct.heLZrrcr: fA 3 Vo'ir'' Wf' 

error was 38 33, 39 56, and 36 94, respectively Since 
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these values do not differ among themselves significantly, Bower and 
Trabasso conclude that the all-or-none implication of their model is 
acceptable. 


5 4 The Role of Error Trials 

Section 5 3 dealt with a particular composite assumption, namely, that 
learning is all-or-none, this section deals with a more limited aspect of 
the conditioning model, namely, that there is a constant probability of 
moving from t/ to C on each trial As the model now stands, a subject m 
state U who selects an irrelevant category and then by chance gives a 
correct response is assumed again to select a category at random on the 
next trial Intuitively, one might suspect that a subject would actually 
continue to select that same category until he made an error, and then 
either shift to another category or reselect at random The question is, 
can the model be improved by altering the conditioning model in such a 
way that the subject is assumed to shift (probabilistically) from VXoC only 
on error trials and never on correct-response trials The comparison of 
such a new model with the original one then permits at least some evalua- 
tion of the conditioning assumption of the original model 
To assume that the subject shifts from Uxo C only on error Inals is to 
construct a new model The new model is identical with the old one except 
that the second conditioning axiom is replaced by the following new 
statement For each reinforced inal on which the subject has selected the 
correct category and has made an error, there is a constant probability d 
that the correct response is conditioned to that category, if a correct 
response is made, conditioning does not occur If we now take e = rfi , 
e IS the probability of the cntical event on an error trial In this new model, 

U IS subdivided into two states, £ and S, and hence there are three states 
m all 

C the terminal state, as before, 

£ the unconditioned state just prior to an error, 

S the unconditioned state just prior to a chance success 
The probability of a correct response, then, in the three states is 1, 0, 
and 1, respectively, and the probability of being m each of the three states 
on the first trial is 0, 1 — p, and /», respectively, where p is the probability 
of selecting £i at random. The matrix of transition probabilities for this 
new model c £ S 

cri 0 O'] 

i»=£ e (l-p)(l-e) p(l-e) 

S 10 I - p p J 
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Thus, for example, the probability of shifting from EXoSis the probability 
of the joint event of conditioning not occurring, \ — e, and of the next 
trial being a chance correct response,/) As before, C is absorbing 
As m the original model, statements can be derived that then can be 
checked against data Most of these derived statements, however, look 
very much like those derived m the original model and hence comparison 
of the two forms of the conditioning axiom becomes problematic The 
difficulty arises because most of the test statistics (derived statements) are 
in terms of errors and therefore are in some way dependent on the predic- 
tion, made in both models, that the trials prior to the critical event are 
Bernoulli trials One way to circumvent this is to design two experimental 
situations that yield considerably different/? values (probability of randomly 
selecting a given response alternative) and base the comparison of the two 
forms of the axiom on test statistics that involve/? 

To see this more clearly, consider the expected number of errors pre- 
by Eq 2^ models The prediction of the original model is given 

£(errors) = (30) 

c 

In the modified model, however, the prediction is 

£(errors)a=- (31) 

e 


nr!o?n ^ ^ ^ expected prior to the critical event in the 

^ w"' both error and correct-response trials, and hence 
^ modified model, only error 

IS sim^r w *^°”®'^®mtion, hence the expected number of errors 

arKpi frnm th ^ Comparing the two conditioning assumptions 

of error*: r ^ expression giving the expected number 

TherefL o„ P’ ‘■“f other it does not 

with varnt.o' '''° .c Predicts vanations in the expected number of errors 
docs not ^ number of response alternatives, whereas the other 

lhfco°nduUT„‘„ '‘“'‘'''I for the two forms of 
expected ° I n® T"' r“c ‘‘'"'"ontially sensitive to variations in p is the 

Xal evem r"" “tig.nal model, where the 

critical event may occur on any type trial, the exp^tat.on of this statistic 


1/c 

I + cp/(l — p) ’ 


( 32 ) 
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whereas for the modified model, where the critical event can occur only on 
error tnals, it is 


1 

(1 - p)e 


(33) 


Equation 32, which is only slightly sensitive to variations in />, becomes 
smaller as p increases, whereas Eq 33, which is veiy sensitive to variations 
m^, becomes larger as p increases Thus the original model predicts that 
the trial number of the last error will be slightly lower in a two choice 
than in a four-choice concept-identification task, a prediction to be com- 
pared with that of the modified model which says that the trial number of 
the last error will be appreciably higher m the two choice than in the 
four-choice task 

Bower and Trabasso report results of original research that give rather 
convincing support to the modified model In essence, the experimental 
situation consisted of two- and four-choice tasks where, for example, one 
group learned to give Ri, Rs, R 3 , and R 4 to red, green, blue, and brown 
stimuli, respectively, whereas another group learned to give to both red 
and green and R^ to both blue and brown For the former subjects p 
turned out to be approximately i, whereas for the latter it was close to i, 
thus the two conditions, the two- and four choice tasks, afforded the 
desired manipulation of p The critical result, however, is the average 
trial of the last error for the two tasks The mean for the two choice task 
was I 44 times the mean for the four-choice task (26 36/18 32 =* 1 44) 
This finding is almost exactly what is predicted by the modified model 
If the estimated p value for the four-choice task is substituted in Eq 33 
and the result divided into the corresponding value obtained when the 
estimated p value for the two-choice task is substituted in Eq 33, the 
result IS 1 5 The original model, on the other hand, predicts a ratio less 
than 1 by an amount which is a function of the size of c For c = 0 037, 
the ratio of the trial numbers for the last error for the two- and four choice 
tasks IS predicted to be 26 04/26 52 = 0 98 It is clear that something is 
wrong with the ongmal model and that the modification is an important 
one Bower and Trabasso conclude that learning takes place only on 
error trials, which is the hypothesis underlying the modification 


6 SUMMARY AND CONCLUSIONS 

Four speculations as to what subjectsdoin a concept utilization task have 
been considered None of these speculations is testable by direct observa- 
tion In each case, however, the speculation was formulated with sufficient 
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precision to allow the deduction of new statements which are testable by 
direct observation Usually, the deductions rested on the probability 
calculus, a convenience arising from the fact that the conjectures were 
expressable as equations The several theorists involved hypothesized 
variously that concept utilization could be viewed as paired-associate 
learning, as cue conditioning, as strategy selection, and as a combination of 
selection and conditioning 

Bower s one-element model for paired associate learning was seen to 
be inadequate as a descriptive device for the concept-utilization situation 
In making this application, Suppes and Ginsberg fitted the model twice, 
once using the relevant categories as the stimuli and once using the separate 
instances of the relevant categories as stimuli It was argued that the 
former procedure involved a greatly oversimplified representation of the 
stimulus situation and hence that the stimulus axiom of the model was not 
satisfied 


The derived statements following from Bourne and Restle’s cue- 
conditioning conjecture turned out to be impressively appropriate for the 
experimental situation studied, especially regarding variations in relevant 
redundancy and irrelevant additivity To date, this model has generated 
and handled more research data than any of the others 
Restle’s strategy selection model was seen to be inadequately tied to the 
stimulus situation, a shortcoming which produces difficulties when state- 
m^ts about variations in the stimulus situation are to be derived The 
difficulty IS that the arguments leading to predictions cannot be made 
completely within the model This makes comparisons between theory 
and data of doubtful legitimacy 

The final model considered was Bower and Trabasso’s selection- 
conditioning model, a conjecture which represents concept utilization 
as a wo stage process selection ofa category and selection of a response 
ernative Conditioning of a response alternative was hypothesized to 
bean all-or-none affair contingent on selection ofa relevant category The 

examined, and a modified conjecture about the 
role of error trials was shown to improve the model 
Making comparisons among the four models, we cannot help but detect 
l.h bn h '^h.ch (1) make a firm contact 

o^ra^mn or ' .n' “"d (2) allow for the 

is Door in r an one process at a time Restle’s strategy selection 

noor in verff r' “"ti.'* “ a'"E''-P''°aass model, and consequently 
Rcstic’s cue-e'^^H^'' successful models. Bourne and 

eonlt.onmr a a"d Trabasso’s select.on- 

co^tioning model, both make good contact and postulate multiple 
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Preference, Utility, and 
Subjective Probability 


Of the major areas into which experimental psychology has been 
traditionally partitioned, motivation is the least well understood and 
systematized This is true whether we consider theory, experimental 
paradigms, or experimental results 


The psychology of motivation, compared with that of learning or of sensory 
and perceptual processes, is peculiarly retarded and confused Much of what 
passes for discussion of motivation is pieced together out of fragments from 
physiological psychology, learning personality, social psychology, and psycho- 
pathology These patches often look violently out of place, worse still, they 
conceal the underlying cloth so well that one may doubt whether it exists at all 
(Irwin, 1958, p 152) 


Nevertheless, as Irwin goes on to point out, at least one motivational 
concfpi, preference, is just as basic to psychological theory as, for example, 
are the concepts of discrimination and instrumental conditioning that 
have arisen in the other more developed branches of psychology More- 
over, of the various notions usuallyconsidered to be primarily motivational, 
preference is the only one that mathematical psychologists have attempted 
to analyze with any care there arc almost no satisfactory formal theories 
concerning, for example, drive and incentive, and those that exist are best 
discussed as aspects of learning So this chapter on mathematical theories 
of motivation is imited to a study of preference and to the closely related 
constructs of utility and subjective probability 


1. GENERAL REMARKS ON THF 
PREFERENCE 


STUDY OF 


1 1 Origins of the Mathematical Theories of Preference 

mn '’’'"""S of preference as a 

fhn^ me ,e eri! " "““W bo'Hy 'vreneh history to suggest 

as n schol "Jr P'°P'= class, f/themsehes 

worked out by economists and statisticians «ho needed psychological 



GENERAL REMARKS ON THE STUDY OF PREFERENCE 


^53 


iderpmnings for their decision theories Only m the last half dozen 
:ars have psychologists begun to isolate for separate study these inherently 
lychological theories of preference While being elaborated as distinct 
id testable psychological theories, the theories of preference have begun 
acquire a richness and complexity— hopefully reflecting a true richness 
id complexity of behavior— that renders them largely useless as bases 
r economic and statistical theories Perhaps we may ultimately find 
Tiple, yet reasonably accurate, approximations to the more exact 
iscnptions of behavior that can serve as psychological foundations for 
hertheoretical developments, butatthe moment this is notthe mam trend 
The psychological reader should be warned that statisticians and 
onomists have long employed a technique of nonempirical, rationalistic 
gument which is totally foreign to many psychologists, who tend to the 
her extreme of automatically rejecting plausible hypotheses unless they 
ve powerful experimental support Psychologists sometimes seem 
ghtly naive in their reverence fi>r what are held to be pure empirical 
:ts, when actually most experimental inferences depend on some more 
less implicit theoretical position, often partially embodied m a statis- 
al model Be that as it may, many theories of preference were formulated 
d evaluated initially in terms of what a ‘‘rational” person having 
limited computational abilities and resources ought to do Frequently 
!se debates have a somewhat tenuous quality for psychologists, especially 
ice there does not exist any really satisfactory comprehensive definition 
rationality and only a few of its properties are generally agreed on 
Some psychologists have simply rejected work of this genre as sterile, 
t those who have taken an active interest m it have found two research 
ths open First, can these so called normative theones also be adequate 
descriptions of behavior"^ Attempts to answer this question have led to 
poratory tests of the normative theories — with, Aoivever, mostly 
ibiguous results Second, can theories be devised that are frankly 
jcriptive in intent, that try to encompass explicitly some of the phe- 
mena that seem to be found in the laboratory? Several attempts are 
cussed 

Dur attitude in this chapter is pnmanly psychological we ask whether 
heory seems to describe behavior, not whether it characterizes a rational 
m, we report experiments, although admittedly not enough experi- 
ntal work has yet been done to provide us with either completely 
isfactory designs or highly reliable phenomena, and we explore some 
the relations between theones of preference and other psychological 
ones At the same time, we try to recount the normatn e considerations 
it led originally to the theones and to cite the more important 
lonalistic criticisms that have been leveled against them 
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1 2 Preference, Discrimination, and Bias 

Irwin (1958, p 152) developed m detail the widely accepted thesis that, 
preference is exactly as fundamental as discrimination and that, 
indeed, the two are so intimately related in behavior that if the organism 
exhibits a discrimination, it must also exhibit a preference, and conversely ” 
Without reproducing the supporting argument in detail, we may suggest 
the mam point Suppose that outcome results either when response 
IS made to stimulus presentation j, or when rg is made to whereas 
results when rg is made to Si or when is made to Sz Before we can 
expect a subject to respond differentially and thus to show his ability to 
discriminate between and it must matter to him which outcome 
occurs— he must prefer one outcome to the other Equally well, he can 
evidence a preference between ar, and only if he is capable of discriminat- 
ing the discriminative stimuli, for only then can he know (or learn) which 
response is appropriate on each trial to achieve the preferred outcome 
A careful examination of Irwin’s discussion suggests that this view is 
correct the notions of discrimination and preference are both funda- 
mental to an understanding of behavior and they are profoundly inter- 
twined in parallel roles If so, it appears perverse, if not worse, to divide 
psyc 0 ogy-— in particular, that portion based upon choice experiments— 
into a part concerned primarily with the discrimination of stimuli and a 
part concerned primarily with preferences among outcomes We are 
saved from total chaos, however, by the experiments that in pnnciple we 
should perform, but rarely do because we are certain of the results on the 
basis either of the informal experimentation of experience or of studies 
^ psychology In a psychophysical or learning 

p we assume that we know what a subject’s preferences are 

outcomes We are confident that hungry animals prefer 
five rpnf receiving them, that a student prefers winning 

(human) preference expen- 
S. tr/ discriminative stimuli, for example, 

peon e aJf f ? Z our knowledge of 

SL «npn f discnmmable Wc could perform the necessary 
because we^ ^ prove these assumptions, but usually we do not 
Sa sub.L I, "^his means, for example, that 

experiment^ ct exhibits some inconsistency m his choices m a preference 
"o^h of thP n “"*°'”^hcally attnbutc ,t to an ambivalence about the 
’""h*l.ty to tell which outcomes are 

criminate amon<r,(,“^^ stimulus response pairs nor to his inability to dis- 
criminate among the relevant stimuli 
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Although the accuracy of these assumptions can always be questioned, 
there is little doubt that serious attempts have been made to realize them 
in the preference experiments we shall discuss here Therefore, throughout 
the chapter, we take it for granted that the subject has no difficulty in 
discriminating the alternatives presented to him, our focus is entirely on 
his preferences This is not to deny that any ultimately satisfactory theory 
of choice behavior will have to encompass discrimination, preference, and 
learning The current partitioning is a convenient way to subdivide a 
complex problem into somewhat more manageable chunks 

A second point emphasized by Irwin (1958) is more deeply confounding 
in practice than the interlocking of preference and discrimination 
Subjects often — almost always, in fact — exhibit “preferences” among the 
responses as well as among the outcomes To keep the terminology 
straight, we follow Irwin by speaking of these as biases among the responses 
Perhaps the most striking example of response biases is the frequently 
noticed position habits of animals that seem to occur whenever the 
discrimination is at all difficult or whenever the differences among the 
outcomes are slight Furthermore, it should be recalled that response 
biases are ubiquitous m psychophysics (see Chapters 3, 4, and 5 of Vol I) 
and that the more recent psychophysical theories explicitly provide for 
them 

Unfortunately, the same cannot be said for the mathematical theories 
of preference Kone of them acknowledges response biases To some 
extent, theoreticians either have not been particularly sensitive to the 
phenomenon or they have felt that the biases anse from relatively trivial 
experimental errors that can be overcome by appropriate techniques, such 
as randomization Actually, however, response biases seem to be a fairly 
deep problem, and no one knows how to eliminate them experimentally 
Randomization is no solution, despite its widespread acceptance among 
experimentalists, because it only buries (he biases in increased vanability, 

It does not eliminate them We badly need theories of preference that 
explicitly provide for response biases 


I 3 A Classification of Theories of Preference 

As m psychophysical research, preference studies have been largely 
restneted to steady state (asymptotic) behavior Behavioral transients, 
■which are the province of learning, are sufficiently complicated to analyze 
that only the simplest learning expenments have been dealt with m any 
detail These learning expenments are too simple to provide answers 
to many of the questions that we wish to ask about preferences, and so 
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we are forced to confine our attention to asymptotic behavior Unfor- 
tunately, some recent studies (Edwards, 1961a, Lmdman & Edwards, 
1961) suggest that true experimental asymptotes may be harder to come 
by than one would wish 

Because the focus is on equilibrium behavior, many of the theories of 
preference are simply static m character, including no explicit mechanism 
for temporal changes A few recent theories, however, derive asymptotic 
mean predictions from well-specified stochastic learning processes As we 
shall see (Sec 7 3), the predictions of these two types of theories are not 
very compatible, but the problem does not seem to be simply to choose 
between them, rather, some sort of fusion is needed The static theories 
embody, admittedly roughly, something of a subject’s cognitive or rational 
analysis of the choice problem, whereas the asymptotic learning theories 
encompass the fine-gram adjustments made by subjects to their recent 
experiences One can hardly doubt that both mechanisms exist in people 
Several distinctions that are important both substantively and mathe- 
matically can be made among these equilibrium theories of preference 
We make three and use them to organize the chapter 
The first is between what may be called algebraic and probabilistic 
theories Let us suppose that the responses that a subject makes to 
stimulus presentations are governed by probability mechanisms Then 
the theory can be viewed as algebraic if all such probabilities are either 0. 
i, or 1 Although this definition is strictly correct, it misleadingly suggests 
that the algebraic theories are simply special cases of the probabilistic 
ones They are not The algebraic ones employ mathematical tools, 
namely, algebraic tools, that are different from those used in the proba- 
mhstic theories, and so it is appropriate to develop them separately 
Historically, the algebraic theories were studied first, and they have been 
used in economics and statistics almost exclusively The probabilistic ones 
are largely the product of psychological thought, forced upon us by the 

data we collect in the laboratory t' J 

Some authors have chosen to refer to the probabilistic theories of 
preference as stochastic theories, and this usage is gaming in popularity 
A unfortunate because it blurs a valuable distinction 

that includes a time parameter, continuous or 
tMiTo “ stochastic process (see Chapter 20) One 

Th, c transitions over time should be called something else 

learning theory for a preferenceexperi- 
tha^ <= mni described as a stochastic theory of preference, but theories 
f P^otiabihstic choice mechanism without a means of 


changing the probabilities i 


speak of as probabilistic theories 


time (for example, with experience) \ 
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The second distinction, an experimental one, is between certain and 
uncertain outcomes It is simply a question of whether the outcome is 
prescribed by the stimulus presentation plus the response or whether these 
only determine a probability distnbution over the outcomes In terms of 
the notation of Chapter 2 (p 86), if s is the presentation and r the response, 
the conditional outcome schedule to = ^(r, s) is a random variable with 
the trials as its domain and the set A of outcomes as its range If the 
outcome schedules have the special property that for all :t€A and all 
CO G Q, where Q is the set of conditional outcome schedules, the conditional 
probability 7r(ar j «o) is either 0 or I, then we shall say that the expenment 
has certain outcomes Otherwise, the outcomes are uncertain 

Theories for certain outcomes are usually stated m such a way that, in 
principle, they can be applied to uncertain outcomes, however, m that 
context they usually seem to be very weak theones because they fail to 
take into account the complex structure of the uncertain outcomes On the 
other hand, theones for uncertain outcomes that explicitly include the 
probability distnbution over the set of outcomes can always be specialized 
to the certain case, but again the effect is to produce weak theories 
Presumably, this is a temporary problem and ultimately the specialization 
of a theory for uncertain outcomes will yield a satisfactory theory for 
certain ones 

Statisticians sometimes attempt to make further distinctions among what 
we are calling uncertain outcomes When the subject knows iheprobabihty 
distribution over the outcomes, they speak of risky choices, and they 
reserve the word “uncertain” either for all cases that are neither risky 
nor certain or for those in which the subject has no information at all 
about the distribution As these distinctions are difficult to make precise 
and they do not seem particularly useful for our purposes, we shall not 
bother with them 

To add further to the terminological confusion, information theorists 
use the word “uncertain” to describe situations in which the probability 
distributions are known, that is, to refer to situations that decision theonsts 
describe as pure risk Their use is not inconsistent with ours, but it is less 
broad 

The third and final distinction is between simple choice experiments 
and ranking experiments Stnctly speaking, both are choice cxpcnmcnls 
as that term is used m Chapter 2 The question is whether the subject is 
asked to select among several outcomes or whether he is asked to rank 
order them, that is, to select from the set of rankings It is generally 
bclicNcd that there must be some regular relation between his behavior m 
these two kinds of experiments when the same outcomes are involved, 
and the ranking theories attempt to desenbe these reJauons 
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With three binary distinctions, there ought to be eight theoretical 
sections to the chapter, actually there are only five The reasons are 
that, at present, the algebraic ranking theories are trivial extensions of the 
choice theories, and so do not warrant separate sections, and that no 
special probabilistic ranking theories have yet been developed for uncertain 
outcomes 


1 4 Previous Surveys of the Literature 

In the last decade several excellent surveys of various aspects of the 
subject matter of this chapter have appeared Edwards (1954d) and 
(1961b) are particularly well known to psychologists Adams (1960) 
provides an excellent survey of Bernouilhan utility theory (a topic 
discussed in Sec 3 of this chapter) for both economists and psychologists 
Arrow (1951b, 1964) and Majumdar (1958) are mainly addressed to 
economists, but both articles contain many useful remarks that will help 
psychologists who seek a general theoretical orientation in the field 
Material of a similar sort is also to be found m Luce and Raiffa (1957), 
which IS addressed to a general social science audience 
We emphasize that some of the work described and analyzed in these 
sources is not covered m this chapter, and the interested reader is urged 
to refer to them 


2 GENERAL ALGEBRAIC CHOICE THEORIES 

As was no^d previously, algebraic choice theories for certain outcomes 
ave a ong istory in classical economics, beginning at least as early as 
i,tn J®reiny Bentham m the eighteenth century His definition of 

aJrl. ‘’P >>'5 famous The Principles of Morals 

Z ct^ninZl^s ^’ ' P°‘"‘ ^ 

benefit property in any object, whereby it tends to produce 

comes to th^same^& ^ 

the hapDeninp of comes again to the same thing) to prevent 

IS considered^ if that Dartv*'hrth ^ ""*'“PP"''ss to the party whose interest 

the communitv if a na ? the community m general, then the happiness of 
community, if a particular individual then the happiness of that individual 

number" a^m greatest good for the greatest 

elforUn “ "‘“■"'■“‘i. nnd he spent considerable 

etiort in formulating a program for measuring utility 
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In the classical economic literature on utility, the most important 
distinction is between ordinal and cardinal utility The term ordinal refers 
to the assumption of order only, whereas cardinal xtitrs to the assumption 
of additivity or, if not that, at least uniqueness of numerical assignment 
up to a linear transformation (see Chapter 1, Vol I, p 12) During the 
nineteenth century it was commonly believed that little economic theory 
could be developed solely on the basis of ordinal utility functions One of 
Pareto’s (1906) great achievements was to show that much could be done 
with purely ordinal assumptions It is fair to say that Pareto was the first 
mam contnbutor to the theory of ordinal utility functions, to which we 
now turn 


2 1 Ordinal Utility Functions 

Because of its structural simplicity, the theory of ordinal utility functions 
IS a good place to begin for logical as well as historical reasons Essentially 
the theory deals just with the qualitative preference for one alternative 
over another To apply mathematical analysis m this setting, the primary 
thing that is needed is the representation of preference m the form of a 
numerical utility function Once such a numerical function is available 
It may be used in subsequent theoretical developments, with the possibility 
open of applying standard mathematical tools to problems of behavior 
For this reason, most of this section is devoted to a sequence of increasingly 
general theorems on the existence of numerical utility functions that reflect 
the qualitative preference structure 

Although psychologists find much of the economic literature on utility 
rather far removed from direct experimental or behavioral questions, the 
theory of ordinal utility functions was, m fact, developed to answer some 
rather specific questions about consumer behavior It is appropriate to 
begin by sketching the setting of these questions Most of what we say 
here is drawn from Wold and Jurcen (1953) and Uzawa (I960) 

Suppose that a consumer has income Af at time /<» (in the sequel we 
consider only the static case of /#, so we drop explicit reference to time) 
With his income the consumer may purchase a bundle of commodities— 
traditionally one of these commodities may be savings We can describe 
such a bundle by a real n-dimcnsional vector x « (tj, . .x„), where 

the ;ih component x, specifies the amount of commodity / to be consumed 
Following standard notation, we say that bundle x js greater than x , tn 
symbols, x > x', if for every t, x< 

foundation of the theory of consumer demand, a preference relation P on 
commodity bundles is introduced The relation xPy is read commoditv 
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bundle x is preferred to commodity bundle y The consumer’s income M, 
the current market prices and the structure of his preference relation P 
determine his choice of commodities 
In addition to the relation P^ it is also necessary to introduce a relation 
I of indifference, for it is not reasonable that of two distinct bundles one is 
necessarily strictly preferred to the other Technical economy of formula- 
tion IS achieved at many points by replacing P and / by the weak preference 
relation R The relation R stands to P in the same way that the numerical 
relation > stands to > The obvious equivalences are 

xRy if and only if xPy or xly, 
xly if and only if xRy and yRx, 
xPy if and only if xRy and not yRx, 

and we shall assume them without comment The following postulates on 
the structure of the (weak) preference relation R are reasonably intuitive, 
and they guarantee the existence of a utility function 
D^nition I A relation R on the set of all commodity bundles in- 
dimensional vectors) is a preference relation f the following axioms are 
satisfied for any bundles x, y, andz 


I Transitivity // xRy and yRz, then xRz, 

1 Connectivity xRy or yRx, 

3 Nonsatiety if x>y, then xPy, 

4 Continuity if xRy and yRz, then there is a real number X 

mch that 0^>.^landlh; + i\ - A) 2 ]/y 

Of these four postulates, the first two are of a very general nature, 
rtirn^ otmu a ion oes not depend in any way on x, y and z being n- 

dimensional vectors When I and 2 are satisfied, the relLon is calkd a 

that'irf'ih® » a understood that the or of Axiom 2 is inclusive, 
that IS, both xRy and yRx may hold 

axmm° of t f Axiom 3 is called the 
pZTir alwTvs that the consumer, celens 

unrealistic n"* “'^‘•"5' commodity to less This is obviously 

^ i/fy) if and only if xRy (1) 

There is, however, a classical and important counterexample that shows 
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at an arbitrary weak ordering cannot be represented by a numencal 
notion 

Dunterexample Consider the lexicographic ordering of the plane 
u Vs) if and only if either Xj > yj or = yj and ^ yj 

jppose that there exists a real valued function u satisfying Eq 1 We fix 
and f /2 with 2:3 < yg and define for each Zi 

u (ar,) = «(xi, Xj) 

n terms of these functions define the following function / from real 
umbers to intervals 

Dn the assumption that the ordering is lexicographic,/ must be I I since 
wo distinct numbers are mapped into two disjoint intervals For instance, 
f *1 > x/, then ii/xj) = > m(x,', yj) s= u'(x^') But it is well 

<nown that it is impossible for a one to one correspondence to hold 
between the uncountable set of real numbers and a countable set of 
nondegenerate disjoint intervals Thus no such function / can exist, and a 
fortiori there can be no function u satisfying Eq J for the lexicographic 
ordering 

Note that Axiom 4 is not satisfied by the counterexample because, for 
example, the point <4. 2) is between (2, 2) and (4, 4) in the lexicographic 
ordering but there is no number ? such that 1{2, 2 ) + (1 ~ ^)(4. 4) = 
(4,2) 

We now show that the postulates of Dcf I do guarantee the existence 
of a utility function 

Theorem 1 1 ei R he a preference relation on ihe sei of eommodiu 

biimUcs in the sense of Dcf f Then there exists a utifm function u that 
saiisfics Eq 1 

PROOF We first define the function u for the **djagon 3 l bundles z, 
which are defined b) the propert) that « x, for all i b> u(x) =* x, 
Since any two indifTerent \cctors must ha\c the same utility, we mav 
extend u to anj Ncctor b) constructing a diagonal >ccior to which it is 
indiffcrcm Let v be an> vector. let y* be the diagonal vector with all of 
Its components equal to the smallest component in y, and Id y** be the 
diagonal vector with all its components equal to the largest cowponeni in 
y Obviously, y** 2: y ^ V*. and therefore bv Che axiom of ronMtietv. 

nv the axiom ofcontinuitv. there is a 1 such that vlUy** -*• 

(I — /)/*l and A on eaoJv be shown to be unique whenever y it not a 
diaconal vector Since ;«*• -f (I - A)y* is a cSjco-jjI vretor from ih- 
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definition of u we have in terms of the first components of y* and y**, 
u(j/) = Xy** + (1 — X)y^ 

Let z be any other vector. We must show that 

if and only if yRz 

First, assume that yRz and u(y) < m(z) Then 

A'z** + (1 _ A')z* > Xy** + (I - X)y*, 

and therefore xRz by the nonsaliety axiom, and thus ylz Therefore u{y) = 
«( 2 ), contrary to the supposition u{y) < i<(z), so u{y) ^ u(z) Conversely, 
assume that u(y) ^ 11 ( 2 ) Then by definition 

Xy** + (1 - X)y* > X'z** + (1 - X')z*, 
and therefore by the axiom of nonsatiety 

[Xy** + (1 - X)y*]R[X'z** -f (1 - X')z*] 

Thus, because the ordering of R is preserved by substitution of /-indifferent 
vectors, it follows that yRz, which completes the proof 
Given prices p^, , a consumer with income M can afford to buy 

any commodity bundle z = (x^, , x„) such that 


Pi^i + + ^ M 

The behavioral prediction of the theory is that he will select a bundle that 
maximizes utility, subject to this income restraint 
The problem of the numerical representation of preferences is not 
comp ete y solved by proving the existence of a numerical utility function 
♦L now the extent to which numerical methods of analysis may 

hen be applied, it is also necessary to know how unique the obtained 
^ Simple matter to prove the following theorem 

P Chapter 1 for a more extensive 

discussion of questions of uniqueness) 

« *e <1 preference relation m the sense of Def I Then 

monotone IrZformauZ ^ ^ increasing 

commod^tv”<n domain of utility functions to /i-dimensional 

unTcrntaL r' " economics, and it is certainly 

Fortunatelv the og'cal investigations of preference and choice 

utiliiv function general circumstances under which an ordinal 

glncrah^e D^f Z! ‘characterized First, we 

preference relation ^ say that a relation R on an arbitrary set ,4 is a 
or if It IS a weak ordering of A, that is, if it satisfies 
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Axioms 1 and 2 of Def 1 Second, let ^ be a subset of A Then we say 
that B IS R-order-dense m A if and only if for every x and y in A, but not in 
B, such that xPy there is a 2 in 5 such that xRz and zRy Note that the 
denumerable set of rational numbers is order-dense with respect to the 
natural numerical ordering ^ in the nondenumerable set of all real 
numbers This relationship between the denumerable rational numbers 
and all real numbers is just the one that is necessary and sufficient for the 
existence of a utility function satisfying Eq I With respect to a preference 
relation that is a weak ordering a minor complication arises in applying 
the denumerability condition, namely, the elements of the order-dense 
subset must not be indifferent This additional condition is made precise 
in the statement of the theorem 

Theorem 3 Let A be an infinite set and let R be a preference relation on 
A Then a necessary and sufficient condition that there exist a utility 
function satisfying Eq 1 is that there is a denumerable subset B of A 
such that (i) B is R-order-dense in A and ( 11 ) no two elements of B stand 
m the relation /, that is, for any distinct x and y m B either xPy or yPx 
PROOF [The proof is related to the classical ordinal characterization of 
the continuum by Cantor (1895) We do not give all details here, but 
sketch the mam outlines, for some additional details and related theorems, 
see Sierpinski (1958, Chapter II) and Birkhoff (1948, pp 31-32) ] 

To prove the sufficiency of the condition, let ^ be a denumerable subset 
With properties ( 1 ) and (u) Moreover, if A has endpoints with respect to 
the ordering of R, we may without loss of generality include them in B 
First, we know that there exists a utility function u for 5, just because B is 
denumerable (this is Theorem 6, Chapter 1, p 26) Now by the i?-order- 
dense condition on B, each element y of A that is not in B defines a cut 
m B, that is, the partition of B into two sets, X={x\xeB & xRy} and 
Z={z\zeB& yRz) Now let 

r, = g J b u(x) 
xeX 

and 

Ta = I u b m( 2 ) 

MeZ 

We then extend w to y by defining 

u(y) = 

It IS easy to show that the utility function u thus extended from B to A 
satisfies Eq I For example, if Hj and n* are in A but not in B, then if 
Ui/?u, there is a s in ^ such that thus 

wf>‘i) S "(0 ^ "(**0 
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definition of u we have in terms of the first components of y* and y**, 
“(y) = + (1 - X)yf 

Let z be any other vector We must show that 

^ u( 2 ) if and only if yRz 
First, assume that yRz and u(y) < m(z) Then 

A'z** + (1 _ > Xy** + (I - %*, 

and therefore zRz by the nonsatiety axiom, and thus ylz Therefore «(i/) = 
u(z), contrary to the supposition u(y) < u(z), so u(y) ^ u(z) Conversely, 
assume that %) > u(z) Then by definition 

Ay** + (1 - X)y* > A'z** + (1 - A')z*, 
and therefore by the axiom of nonsatiety 

[Aj/** + (1 - A)ii*]J?[A' 2 »» + (1 - X')z*] 

Thus because the ordering of R is preserved by substitution of I indifferent 
vectors, it follows that yRz, which completes the proof 
Given prices p^, , p„, a consumer with income M can afford to buy 

any commodity bundle le = (i„ , such that 


Ri*! + + p„x„ ^ M 

The behavioral prediction of the theory is that he will select a bundle that 
mwimizes utility, subject to this income restraint 

he problem of the numerical representation of preferences is not 
comp etc y solved by proving the existence of a numerical utility function 
♦L extent to which numerical methods of analysis may 

then be applied, it is also necessary to know how unique the obtained 
^ simple matter to prove the following theorem 

P see Chapter 1 for a more extensive 

discussion of questions of uniqueness) 

^ . f ^ “ P''^/erence relation in the sense of Def I Then 

mnnntn ^ ^^^^sfying Eq I are related by an increasing 
monotone transformation ^ 

conimod^tv domain of utility functions to n dimensional 

unaccent^Wf^^'^^^ economics, and it is certainly 

Fortunatelv t'h o*°S‘cal investigations of preference and choice 
utihtv fiinr’i ^ general circumstances under which an ordinal 
«ncr\l.?e dT characterized First, we 

prcrcr^cr?jL r° '^^**‘"* “ ^ ^^bitrary set A is a 

ion for A if it is a weak ordering of A, that is, if it satisfies 
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Axioms 1 and 2 of Def 1 Second, let ^ be a subset of Then we say 
that 5 IS J? order-dense m A if and only if for every x and y m A, but not m 
B, such that xPy there is a z m B such that xRz and zRy Note that the 
denumerable set of rational numbers is order-dense with respect to the 
natural numerical ordering > in the nondenumerable set of all real 
numbers This relationship between the denumerable rational numbers 
and all real numbers is just the one that is necessary and sufficient for the 
existence of a utility function satisfying Eq 1 With respect to a preference 
relation that is a weak ordering a minor complication arises in applying 
the denumerability condition, namely, the elements of the order dense 
subset must not be indifferent This additional condition is made precise 
in the statement of the theorem 

Theorem 3 Let A be an infinite set and let R be a preference relation on 
A Then a necessary and sufficient condition that there exist a utility 
function satisfying Eq I is that there ts a denumerable subset B of A 
such that (i) B is R-order dense m A and (n) no two elements of B stand 
m the relation /, that is, for any distinct x and y in B either xpy or yPx 
PROOF [The proof is related to the classical ordinal characterization of 
the continuum by Cantor (1895) We do not give all details here, but 
sketch the mam outlines , for some additional details and related theorems, 
see Sierpinski (1958, Chapter II) and Birkhoff (1948, pp 31-32) ] 

To prove the sufficiency of the condition, let j? be a denumerable subset 
With properties (i) and (ii) Moreover, if A has endpoints with respect to 
the ordering of R, we may without loss of generality include them m B 
First, we know that there exists a utility function u for just because B is 
denumerable (this is Theorem 6, Chapter J, p 26) Now by the R order- 
dense condition on B, each element y of A that is not in B defines a cut 
m B, that is, the partition of B into two sets, X={x\z6B & xRy} and 
Z={z\zeB& yRz} Now let 

r, = g 1 b ii(x) 

xeT 

and 

Tj = 1 u b «(z) 
teZ 


We then extend « to y by defining 


»(y)« 


rt + r, 
2 


It ts easy to show that the utility function u thus extended from B lo A 
satisfies Eq 1 For example, if Kj and »* arc m A but not in B, then if 
hiRiif there is a e in 5 such that and thus 

“f«j) ^ w(s) ^ 
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The remaining details of this part, of the argument may be supplied by 
the reader 

To prove the necessity of (i) and (u), we assume that we have a function 
u satisfying Eq 1 The set of nonempty intervals of the real numbers 
with rational endpoints is a denumerable set because of the denumerability 
of the rational numbers We next construct corresponding intervals Jt 
of A by taking the inverse image under u of each interval /, (Note that 
not every point in need correspond to a point in ^4 ) From each 7^ that 
IS not empty we select an element Since the set of intervals is denumer- 
able, the set X of elements x^ of A is denumerable Now, let be the set 
of real numbers r such that for some y m A, u{y) = r and 

u(y) — 1 u b t/{x) > 0 

where 

Y={x\xeA& xRy) 

Because this set^ defines a set of nonoverlapping intervals of real numbers 
it IS at most denumerable (of course, ^ can, and m some cases would, be 
empty) Let X be the inverse image under uo( Bl Then B ^ X U X' is 
denumerable To show that B is order-dense in A, let and t2 be two 
elements in A but not in B such that liPti Now if there are no elements of 
A between ti and then is in X contrary to hypothesis On the other 
hand, if there are elements between and then at least one, say 
Will be such that «(s) lies m an interval with rational endpoints which is 
nested in the interval tu(r2 ), m(/|)], and thus B is order-dense in A This 
completes the proof 

It IS, of course, possible to generalize Theorem 3 by no longer requiring 
the utility function to be real-valued General theorems for any preference 
relation, that is, for any weak ordenng of a set, are given m Chipman 
(1960a) He shows that if for the range of the utility function we replace 
the real numbers by transfinitc sequences of real numbers under their 
natural lexicographic order, then such a utility function exists for any 
preference relation Such generalizations are not pursued here, for they 
are of dubious interest for psychological theory 


2 2 Topological Assumptions® 

U IS ca^ly seen that, although the utility function we constructed m the 
proof of Theorem 1 is continuous, the construction may be modified to 
obtain a discontinuous function satisfying Eq 1 For example, for a 
* Thii section can be omillcd without loss ofcontmuily 
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lagonal x let 


«'(*) - 


if X, < 1 

L + 1 if > 1 


g«j|; 

reference it is almost „ often a desirable if not always 

ehaved functions, for which ^ anse First, once we drop 

sufficient restriction Two questio ^ ^ e,,all 

be restrictive =>ss“"'P“°''“J“o“,^afes guarantee the existence of at 
lean by continuity "second, how can we guarantee 

:ast one continuous utility utility functions are continuous 

hat all, not merely one. of the P° “%“3'excurs,on into topology 
To answer these questions even though topological 

Dur treatment of these matters is economic literature 

lonsiderations have been P™™"'" ,r they are of direct 

m utility and “ft" ,„„,sts Our discussion is intended 

ngnificance for the research of psyrto g „lity of topological 

only to give the reader some feding_for^ 

methods without pre'“'P'"S P^e no prior knowledge of topolo®, 

introduction to the subjret W , ^i,,arwith the notion of continuity 
but we do suppose that the read 

that is defined in textbooks on *P' ' „ai numbers is continuous 

Intuitively, a function from real numW ^ q,fi„„ion that makes 

if Its graph has no jumps or pps m it ^ 5 „J,mensionaI 

prccife this intuition is casdy However, when we connder 

vectors of real numbers to the real numtera ,„;,<jiale method of 

functions defined over arbitraiy sets, tne 


s66 


PREFERENCE, UTILITY, AND SUBJECTIVE PROBABILITY 


defining continuity, indeed, unless some additional structure is imposed 
on the set, nothing can be done The type of structure that permits us to 
formulate a general definition of continuity is known as a topology 
The most important property of continuous functions is that they map 
points that are “near” one another into points that are again “near” one 
another Put another way, the function does not create too much scram- 
bling of points that are reasonably close to each other In talking about 
functions of real numbers, this idea is adequately captured by requiring 
that intervals be mapped into intervals Let us restnct ourselves to the open 
intervals, that is, to intervals that do not include their endpoints Then 
the natural topology of the real line is the family of open intervals together 
with the sets that are formed from arbitrary unions and finite intersections 
of open intervals Members of this family of sets are called open sets 
It is these last ideas that are generalized to define topologies for 
arbitrary sets 

Definition 2 A pair {X,S~) is a topological space if ^ is a family of 
subsets of X, called open sets, such that 


(i) the empty set is m 

(ii) X IS in S' , 

(ill) the union of arbitrarily many sets m S' is also m S’ 

(iv) the intersection of any finite number of sets in ^ is in S' 

We also say that S is a topology for X 

Correspondingly, the ordinary e-6, that is, interval, definition of 
wntmuous functions is generalized to a condition in terms of open sets 
e initlon 3 A function from one topological space into another is 
continuous if the inverse image of every open set is open More precisely, 
V ^ be topological spaces, and let f be a function from 

Xmto Y ThenfisS-<2^conUmiousifwheneverUe<?^,thenf-KU)eS 
Finally, we define the notion of separability of a topological space, for 
which It IS useful first to introduce the notion of a base 
e initlon 4 Abase for the topology S' is a class ^ of open sets {that is, 
mem ers ofS^ such that, for every xin X and every open set T containing 
there is a set B in 33 such that xQBandB^T 

defined above is determined by the 

oXirn . i ^ base 

base ^ IS called separable if S' has a countable 

in firt' construct a countable base for the real line and, 

>n fact, for any /i-dimensional Euclidean space 

search rTr wt^bty functions would be flawed if the 

search for continuous functions required a topology on the set of objects 
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or outcomes that is extraneous to the preference relation itself Fortu- 
nately, this IS not the case Any relation on a set generates a topology on 
the set m precisely the same way that the natural topology of the real line 
IS generated from the open intervals For a set A with relation we 
define a R-open mtenal to be any subset of A of the following form 

(^,y) ~ (zf ire A & xPzPy}, 

where P is defined m terms of R as indicated earlier The Reorder topology 
for A IS then the collection of sets generated from i?-open intervals by 
arbitrary umons and finite intersections It is apparent that the order 
topology on the set of real numbers generated by the relation ^ is the 
natural topology already defined in terms of open intervals 
Theorem 4 I^/ R be a preference relation for the infinite set A, and let 

the J?-order topology for A be separable Then there exists on A a 
continuous, real-valued, order-preserving utility function 
The proof of this theorem is too long to include here A proof of 
essentially the same theorem is given by Dcbreu (1954), as far as we 
know, his paper contains the first discussion of the continuity of utility 
functions from a topological standpoint For additional discussion see 
Chipman (1960a), Debreu (J963), Murakami (1963), and Newman and 
Read (1961) 

Theorem 4 settles positively the first of the two questions raised at the 
beginning of this section The second question concerned conditions 
that would guarantee the continuity of all the utility functions on A 
It should be apparent that some condition beyond Eq 1, that is, beyond 
the order-preserving property, most be imposed in order to obtain this 
result The matter does not seem to have been discussed in the literature 
as extensively as is desirable, but additional sufiicient conditions can be 
formulated by requiring that the utility function preserve the structure of 
the ^-order topology for the set A 


*2 3 Additivity Assumptions 

In the latter part of the nineteenth century, the economists Jevons, 
Walras, and, to a certain extent, also Marshall made the assumption that 
the utility of a commodity bundle (r,, . , is just the sum of the 

utilities of the individual components, that is, 

uixi, ... ,x^ = Uiix^ + . - + ««(^n) (2) 

This assumption was made in the context of the development of consumer 
behavior until Edgeworth, m his Mathematical Psyches (1881), noticed 
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that for the purposes of the economic theory of the time there was no 
need to assume Eq 2 as well as little justification for doing so A few 
years later Pareto (1906) took the next step and observed that only ordinal 
assumptions were required The upshot of this evolution of ideas m 
economics is that additivity assumptions have not been much investigated 
or used in the economic literature 

On the other hand, the additivity assumption represented by Eq 2 
does seem to hold promise for several kinds of psychological investigations 
of preference and choice The only directly pertinent studies known to us 
are Adams and Fagot (1959), Debreu (1960a), Edwards (1954c), Fagot 
(1956), and Luce and Tukey (1964) Most of the things we shall discuss 
m this section are to be found in the works of Adams and Fagot, Debreu, 
and Luce and Tukey 

To begin with, an additive utility model may be a reasonable candidate 
for application in any choice situation in which a selection among multi- 
component alternatives is required For example, the model might apply 
to an attitude scaling task m which subjects are asked to choose between 
political candidates varying on several dimensions, say, tax policy, 
foreign policy, and civil liberties legislation It should be realized that 
even for small sets of alternatives, such as three or four political candidates, 
the additivity assumption is not automatically satisfied, and thus its 
assi^ption has immediate behavioral implications As an example, let 
^ ■“ “ {*» y}> 2ri<l let J? be a preference relation on the 

artesian product x Suppose that the strict preference ordering 


(a. x)P{a, y)P{b, y)P{b, x) 


For this preference 
numerical functions 


ordering it is apparent at once that there are no 
and Uz such that 


«i(a) + K,(x) > K,(a) + ufy) 

“iW + «,(•/) > «,(*) + 

‘he second. u,(i)< 

funcuon rafs'es “'‘"‘'‘'““"'P'' existence of an additive utility 

SiseiTble ‘■"""■'“'■■'E a set of conditions on the 

funTon For I- r “■“* >ho existence of such a 

robC seems He ‘ ‘'■'I ■" more detail in See 2 4, this 
solution It IS nerhans “ completely satisfactory general 

ue mean bv ? “ ''ttle mom explicit about what 

is, of course, possible to decide whether or not there ex.stH an 
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additive utility function that represents the data m the sense of Eq 2 and 
that IS order-preserving, because the problem for any particular case is 
just one of solving a finite set of linear homogeneous inequalities Adams 
and Fagot, m fact, give one constructive procedure for solving the 
inequalities 

The general problem, on the other hand, is to impose structural 
conditions on the relation R which guarantee a solution Transitivity 
IS an example of a necessary but not sufficient structural condition 
Another necessary structural condition that eliminates the counter- 
example given previously is the condition of independence 


attd 


If (o, x)R{a, y), then (h, x)R{b, y), 
jf (a, ^R{b, x), then {a, y)R{b, y) 


C3) 


It IS easy to see that Eq 3 follows from Eqs I and 2 This is called the 
condition of independence because interaction between the components 
IS ruled out In the language of economists, the two components must 
he neither competitive nor complementary Adams and Fagot give a 
counterexample in which each set Ai and has three elements to show 
that the addition of Eq 3 is not sufficient to guarantee the existence of an 
additive utility function 

The structural conditions we have just discussed are open conditions in 
the sense that they apply to all sets of alternatives for which additivity 
holds They do not postulate the existence of any special alternatives nor 
do they impose on the whole pattern of alternatives a condition that is 
sufficient but not necessary for the existence of an additive utility function 
As we have already remarked, and as we shall indicate m more detail in 
Sec 2 4, there are deep reasons for the difficulty, if not impossibility, of 
giving a fixed finite list of open structural conditions that are sufficient 
to guarantee the existence of an additive utility function, even for two- 
component alternatives that are finite m number However, Scott (1964) 
has given an infinite list of open conditions, in the form of a fairly simple 
scheme, which are jointly necessary and sufficient His results arc 
discussed in a slightly different context m Sec 2 4 

Once open conditions are abandoned, a relatively simple set of suffiaent 
conditions for the existence of two dimensional additive utility functions 
can be given by the axioms of conjoint measurement formulated by Luce 
and Tukey (1964) Moreover, they can be readily generalized to handle 
any finite number of components For simpliaty, however, we consider 
only the two-component case 

Wc begin with a preference relation i? on ^ where Ai and At 
are (not necessarily disjoint) sets The first two axioms are necessary open 
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conditions, namely, 

1 Weak ordering R\s a weak ordenng of X A^. 

2 Cancellation For all a, byfeA^ and p,x,ye A^, if {a,p)Rif> v) 
and (/, x)R[b, p), then (a, x)R{by y) 

The second axiom can be cast in an apparently simpler form by defining 
a new relation D on Ai x A^ which corresponds to a comparison of 
differences instead of sums of components (see Sec 2 4), namely, 

(a, x)D(b, y) if and only if (a, y)R{b, x) 

Then the cancellation axiom is simply the assertion that D is transitive 
The third axiom is not open m that it postulates that there are adequate 
elements in A^ and A 2 so that certain equations can always be solved 

3 Solution of equations For any a, b€ A^ andx, yeA^, there exist 

/6 and p^A^ such that 


{a, y) and (a, x)I{h, p) 

From these three assumptions it is not difficult to show that the 
condition of independence, Eq 3, holds, and so the following induced 
orderings R^ on A^ and R^ on A^ are well defined 


aRyb if and only if, for x e A^, (a, x)R(b, x), 
xRiV if and only if, for a e A■^, (a, x)R{a, y) 

With these definitions, we may introduce the mam constructive device 
used by Luce and Tukey in their proof and m terms of which they 
introduced their final axiom A nontrivial dual standard sequence (dss) 
IS any set of elements of the form {(uj, Xj) ] i any positive or negative 

integer or 0} for which ^ \ J t' 


( 1 ) if 1 yiy, then not aj^af and not xJ^, 

00 foralli, (a,x0/(fl,+i,x, ,), 

(m) forani.(u,^„zO/(a,,x,^.,) 

The final axiom is 


4 Archimedean For 


rA,A.= J rr , nontrumJ dss {(fl„ x,)) and for an} 

ib.y)eA,xA,. there exm rntegerx m a„d n sueh lha. 


With th«= four axioms, the following result can be proved 
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Under these five assumptions, Debreu showed that there exist real- 
valued continuous functions Uf on Ai such that for a, b e Ai and 
x,y e A 2 , 

(a, y) if and only if Ui(a) -f UzCx) > Ui(b) + 1 / 2 ( 2 /) 

Moreover, Ui + «2 is unique up to positive linear transformations By 
suitably generalizing these assumptions (only 4 is the least bit tricky), 
he was able to show that the obvious generalized representation holds for 
any finite number of components The proof is lengthy and is not 
included here 

Note added in proof J Aczel has pointed out to us that there are 
results in the mathematical literature that are closely related to the question 
of the existence of additive utility functions These are classed under the 
theory of nets or, as it was called somewhat earlier, the theory of webs 
(in German, the language of many of the publications, Gewebe) Roughly, 
the problem treated there is this given three families of (abstract) curves 
m a plane for which (1) each point belongs to exactly one curve in each 
family (and so two different curves of the same family do not intersect) 
and (2) curves from two different families have exactly one point in 
common, when is this configuration equivalent (in either an algebraic or 
a topological sense) to three families of parallel straight lines'^ The 
intersection assumptions, (1) and (2), are very similar to, but stronger than, 
the assumption that equations can be solved, Axiom 3 above A necessary 
and, with these intersection assumptions, an essentially sufficient condition 
for such a mapping to exist is the cancellation axiom (Axiom 2 above) 
with equalities replacing the inequalities In the mathematical literature, 
this important properly is known as the Thomsen condition For a 
detailed discussion of such results, and for further references to the 
literature, sec Aczel (1964) 


2 4 Higher Ordered Metrics 

Coombs (1950) introduced the term ordered metric to designate those 
scales that generate an ordering on the set of alternatives and, m addition, 
on the differences between alternatives 
sirnnpi*r ^ i ” *bcsc Ordered metric scales as something 

iZ nL o'"" bnl not as strong as interval scales. Coombs 

demand, where nothing 
of amint"*” '"I *‘'= problems 

«nr™n development of methods for 

romnmn P h ! individuals on a 

common scale. Further development of ordered metric scales was given 
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in Coombs (1952), and a few years later 

behavior were made by Adams and Fagot (1959), Fagot (1959), Hurst 

and Siegel (1956), Siegel (1956), and others Hifference in 

If we speak in terms of the utiUty difference, or *^'fferen“ m 
preference, between pairs of alternatives, then the 
economists is that choices between alternatives o no y j 

evidence on these differences. One can, of ^^btets^ 

introspective Method of introspection 

rs^rSbTbe:ri“^^^^^ s- of the —J— 
on choice behavior, it is nonetheless natural to as relatively 

be overcome by designing operational Sec. 3 

direct behavioral evidence on judgments o u i y uncertain 

we discuss several methods for doing i Without introducing 

outcomes, with choices expressed between elose to 

any element of risk, however, a technique can be used y 

the one underlying the j To ask for a choice between 

simply to present two pairs of objects an . utility differences, 

the pairs. To translate the response into an expression of utility 
we make the following obvious transformation. 

u(a) + u(b) ^ «(c) + «W’ 

differences. This 

and this last inequality is just a way of n relations discussed in 

parallels the correspondence between f „ 9 3 

connection with the cancellation axiom in^ i^p^avioral evidence on utility 
. ^ described by 


Two other methods for obtaining ircc .^.Q^ies are describi 
differences which do not require unce am depends on equating 

Suppes and Winet (1955). One interpre i ' -gy depends only on the 
utility differences with different amoun s . j.;p increasing function 

assumption that amount of money is a m _ , this 


^ monotonic 

U9:>uiiiuiiuii uiai aiinjuiiv nnt reauired. of course, 

of utility difference. A linear relations ip i thought of as objects 

method is restricted to altcmalives t ^ nrefers Commodity x to 

or commodities. Suppose that the su j Commodity Suppose 
Commodity y and also prefers Commo 1 .-jj ^ „ and 
further that he has in his possession by - and 

present him with the opportunity of paying as t a 

q by p. The utility difference bctw«n x as much money to replace 

between/7 and g if and that there is ^ 

2/ by a: as to replace q by p- Let it ^ c P ^ 5 ort of objcc 
spvLl about iL use of money in this methn . A y 
aaivity that may be represented on a eont.nuura is sniu. 
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an alternative operational definition of utility differences which has 
exactly the same form can be given m terms of work. In economic terms, 
what IS needed is simply a commodity that, hke money, is flexible enough 
to serve in different situations and such that its marginal utility is either 
always positive or always negative in the situations under consideration. 

A second method, which eliminates the need for a commodity or 
activity external to the set of alternatives, is a rather slight modification 
of the above procedure Following an example of Suppes and Winet, 
suppose that we confront a housewife with six household appliances of 
approximately the same monetary value and that she docs not have: 


a mixer, a toaster, an electrical broiler, a blender, a waffle iron, and a 
waxer. Two of the appliances are selected at random and presented to her. 
Suppose that they are the toaster and the waxer. She is then confronted 
with the choice of trading either the toaster for the waffle iron or the 

wX non r f ’’f ‘he toaster for the 

LftheToao ‘‘ difference between the waffle iron 

2 wax ?(due r helwecn the blender and 

A srque"cfof such *'8" of the difference). 

can be easily 

ether Unfortunately howew'^wTknow o'f n"“ " 

Snces“ ■» ‘h° =rnt‘o? ufflii; 

scales is a somwhrt'comfflL'ted^"”'** "'ctric 

complete survey here Foaunatel present a 

matters has already been covered in’r'h'’'f discussion of these 

Secs 3 3 and 3 4 Chapter 1 of Vol I, particularly m 

IS formulated prccisdy^by the follomn between alternatives 

to replace the%oneept of a pamaT S'”"’! h™cv=r. convenient 
on the differences of alternLves fa 01 .^® “ ff“csi-ordering 

reflexive relation, as expressed in Ih^ n^ “‘'‘*""’8 "S a transitive and 
Definition 6 a *“31 two axioms) 

metric 1 / and only if the foUowmJ'n^ a set A is a weak higher ordered 
onddinA ^ ^ satisfied for every a, b.c 


2 

3 if abDcd, then acDbd, 

4 if abDcd, then dcDba. 
5. abDbb or baDbb 
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The intended interpretation of the quaternary relation D is that abDcd 
if and only if M(a) — u[j>) ^ w(e) — «(d). In terms of this intended 
numerical interpretation, the meamngs of Axioms 3 and 4 should be clear. 
Axiom 3 expresses what has been called m the literature the quadruple 
condition, initially formulated by Marschak (first stated in print, as far as 
we know, in Davidson and Marschak, 1959, also see Sec. 5 4), and 
Axiom 4 expresses a natural condition on the sign of the differences. 
Axiom 5 states a restricted connectivity condition needed to obtain a weak 
ordering on alternatives as opposed to differences between alternatives. 
To show how a complete ordering may be obtained from these axioms on 
alternatives, we define the weak preference relation R as follows. 
Definition 7 oRb if and only if abDbb. 

The intended interpretation, of course, is that just when a is weakly 
preferred to b. In terms of the desired numencal interpretation, aRb if 
and only if m(o) ^ u{b'). We now prove 
Theorem 6 Ris a weak ordering. 

PROOF. (1) If aRb and bRCt thenaRc. 

By hypothesis aRb and bRc. In terms of the definition, this means that 
abDbb and bcDcc, from Axiom 2 we have bcDbc and so, by Axiom 3, 
bbDcc; from Axiom 4 and one of the hypotheses we have ccDcb; and 
thus by transitivity (Axiom 1) abDcb. However, by Axiom 3 this means 
that acDbb. Once again using the fact that bbDcc and transitivity, we 
obtain acDcc which, by virtue of the definition of R, is equivalent to aRCt 
as desired. 

(2) aRb or bRa. 

From Axiom 5 we have abDbb or baDbb. If abDbb, then aRb from the 
definition of i2. On the other hand, it baDbb, then from Axioms 2 and 3 
we also have bhDaa and thus by transitivity baDaa, and therefore by 
definition bRa. This concludes the proof. 

Additional elementary theorems can be proved, but it is not possible 
to go on to prove that there exists a real-valued function u defined on the 
set A such that 

abDcd if and only if m(o) — u{b) ^ u(c) — u{d). (4) 

An immediate reason is that if such a function u were to exist, then v>e 
would necessarily have a weak ordering of difTcrcnccs between aUcmaiivcs, 
but our axioms only guarantee a quasi-ordcnng The implication of not 
having such a function u is that we cannot use numencal techniques in 
working with a weak ordered mclnc. It is necessary to restnet oursches 
to elementary propcrucs of the quaternary dilTcrcncc relation D and of the 
ordenng relation R, thereby severely limiting the possibility of connecting 
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a weak higher ordered metric to other parts of an analysis of choice 

behavior ... i 

It seems a natural move, therefore, to go from a weak higher oracrca 
metric to a strong higher ordered metne by requiring that the quaternary 
difference relation itself be connected This is expressed m the following 
definition 

Definition 8 Let D be a \>.eak higher ordered metric on the set A Then 
Z) K a strong higher ordered metric // for every a, c, and d in A, the 

axiom of strong connectivity holdsy that iSy abDcd or cdDab 
This last higher ordered metric is essentially the one applied by Hurst 
and Siegel (1956) and Siegel (1956) These experiments arc discussed m 
Sec 41 

Unfortunately, even with its strong conditions, it is not possible to show 
that there exists a numerical representing function u preserving the structure 
of the difference relation Z) of a higher ordered metric An example 
showing this was given by Scott and Suppes (1958) Moreover, by 
extending their initial example, they showed that no finite list of axioms 
that involve only open conditions (as defined m Sec 2 3), that is, that do 
not involve existential assertions, are sufficient to guarantee the existence 
of a numerical representing function From a psychological standpoint, 
this means that the structural conditions on behavior required to strengthen 
a strong higher ordered metric sufficiently to obtain a numerical representa- 
tion and yet not sufficiently strong to guarantee an interval scale are 
quite complicated in character 

Because of the close connection between additivity assumptions and 
conditions on a difference relation, it should also be apparent that the 
same negative results apply a fortiori to the imposition of a fixed finite 
number of open structural conditions to guarantee the existence of an 
additive utility function 

Scott (1964) has, however, shown that an open schema that represents a 
bundle of axioms whose number increase with the cardinality of the set of 
alternatives is adequate 

Definition 9 A quaternary relation D ona set A is a strict higher ordered 
metric if and only if the following axioms are satisfied 

1 If abDcd, then acDbd 

2 abDcd or cdDab 

3 for all sequences of elements 6^, ,b„€A, and all 

permutations w, o of (0, 1, , n), if a,b,Da^^,^b,^,^ for 0^i<n, then 

To make the meaning of Axiom Scheme 3 clearer, we may derive the 
transitivity of D To apply CondiUon 3, it is convenient to formulate 
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transitivity as follows if and then Now 

let IT and a be the permutation of {0, 1, 2} such that 7 t(^ 0 ) « g( 0) = 1, 
3r(l) = (r(i) = 2 and n-(2) = ff(2) = 0 Application of Condition 3 for 
this TT and a yields the desired result at once 
What Axiom Scheme 3 really compnses is all possible cancellation laws, 
some simple example^ of which were discussed in Sec 2 3 The diflScuJty 
with this axiom from a psychological standpoint is that there seems to be 
no simple way of summarizing what it says about choice behavior, but 
this we take to be an inherent complexity of the structural relations that 
must hold between elements of any finite set m order to guarantee the 
existence of a utility function that preserves the order of utility diiferences 
Scott established the following theorem 
Theorem 7 A necessary and sufficient condition that a quaternary relation 
D on a finite set A has a representing utility function in the sense of 
Eq 4 IS that D be a strict higher ordered metric on A 
The relation between strict and strong higher ordered metrics immedi- 
ately follows from Theorem 7 

Theorem 8 Every strict higher ordered metric D on a finite set A is also 
a strong higher ordered metric on A 

The many conveniences that follow from having a numerical represen- 
tation have led to the investigation of several methods of extending strong 
higher ordered metric scales to guarantee the existence of a numerical 
representing function One procedure, that of Suppes and Winet (1955), 

IS to add strong conditions that require the set of alternatives to be infinite 
This amounts to adding to the axioms for strong higher ordered metrics 
those for infinite diiference systems (Axioms 5, 6, and 7) as stated m 
Chapter 1 of Vol I, p 35 Because a detailed discussion of infinite 
difference structures is given there, we do not pursue them further here 
A second alternative is to build up on the concept of a finite equal 
difference system (Def 16, Chapter I, Vo! I, p 39) The intuitive idea of 
such an equal difference system is lo keep the set of alternatives finite, 
but to impose the strong structural condition that alternatives be equally 
spaced in terms of utility differences This means that the total structure 
is analogous to a single dual standard sequence m the sense of Luce and 
Tukey The assumption that alternatives arc equally spaced is, in general, 
much too special and restrictive for applications, but if we can assume the 
availability of a standard sequence of equally spaced alternalivcs, we can 
then use them to provide a scale for the approximate measurement of the 
utility of other alternatives 

The axioms for the equally spaced aUernalivcs arc extremely simple, 
namely, those for strong higher ordered metrics plus the assumption 
needed lo guarantee equal spacing Because of their simplicity, wc 



PREFEREKCE, utility, and subjective 1‘ROUAUILITY 

include them here In order to formulate the additional axiom for the 
standard sequence, wc need the notion that one clement in the standard 
sequence is an immediate /^-successor of another. Denoting by S the 
subset that is the standard sequence of alternatives, the definition of the 
immediate successor relation J for S is just the following 
Definition 10 aJb if and only f aPb and for all c m S aPe implies bRc. 

To define a finite difference system with a standard sequence, wc 
impose only the additional assumption that any alternative is bounded 
from above and below by members of the standard sequence. 

Definition 1 1 A finite difference system with a standard sequence is a 
triple = {A, S, D) in which the following axioms are satisfied. 

1 the set A is finite and S is a subset of A , 

2 the quaternary relation D is a strong higher ordered metric on A. 

3 for a, b, c, and d in the standard sequence S, if aJb and cjd, then 
abDcd and cdDab, 

4 for every b in A, there are altemoUves a and c m the standard 
sequence S such that aRb and bRc 

The full proof of the following theorem, although rather long, is entirely 
elementary, and therefore we shall omit it (see Suppes, 1957, pp 267-274). 
Theorem 9 If.^ * (A, S, D) is a finite difference system mth a standard 
sequence, then there is a real-valued function u defined on A such that 

(i) far all a, b in A, 


aRb if and only if u{a) ^ u{b), 

(ii) for all a, b, c, and d in the standard sequence 5, 

abDcd if and only if u{a) — u(6) ^ u(c) — u(d). 

Moreover, anyfuneuon a' sali^ymg (i) and (li) is related to u by a linear 
transformation on S Lei a' be related to u on the set S by the trans- 
formation u\a) = mid) + ^,for every a m S, and for any a and c in 
S such that ajc define the u-unit by 

u{a) - «(c) = 

Then for any b in A, u and u’ are related as follows 


m the standard 

no’t m thf « d ' a F” alternaUves 

not the standard sequence, measurements are made to withm the 
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accuracy of the unit interval between adjacent members of the standard 

sequence Approximations in terms of a standard sequence, but based on 
a somewhat different rationale, were used experimentally by Davidson, 
Suppes, and Siegel (1957), however, the alternatives were bets involving 
uncertain outcomes The discussion of these results is given in Sec 4 2 
We began this section on ordered metrics by mentioning Coombs’ 1950 
article, but as yet we have not mentioned a second important idea con- 
tained m that article This is Coomb’s concept of an unfolding technique 
which is designed to place both individuals and alternatives on the same 
scale We may think of the position of the individual on the scale as his 
ideal utility point A good example of the usefulness of this kind of model 
IS m the problem faced by a voter when choosing between less than 
perfect candidates In many situations he finds that some of the candidates 
are politically to the left and the others to the right of his own “ideal” 
position Presumably he should vote for the one nearest to his own 
position, whether to the left or right of his ideal 
To sketch how these ideas may be formalized, let x be an individual 
and a and b two alternatives, and let T(x, a, b) be the relation of x prefer 
ring a to b We then want a utility function u to satisfy the following 
condition 

T(x,atb) if and only if 1«(») — «(o)| K*) — «(*)[ 

The problem of formulating conditions on the relation T to guarantee the 
existence of such a utility function u is closely connected to the corre- 
sponding problem for weak, strong, and stnet higher ordered metrics, and 
so we do not pursue it further here Some additional remarks on the 
formal problem can be found m Sec 3 6 of Chapter 1 


2 5 JND Assumptions 

The algebraic choice theones discussed thus far all assume that the 
individual makes such a clear and definite judgment of preference that the 
relation of indifference is transitive, and so is an equivalence relation 
The challenge to this assumption is familiar from psychophysics (see, for 
example, the discussion m Chapters 3 and 4, Yol I) Economists have 
oifered similar objections to classical utility theory A relatively early 
discussion of these matters is found in Armstrong (1939, 1948, and 1951) 
who argues, plausibly enough, that a given alternative a may be indifferent 
to a second alternative b, b may be indifferent to a third alternative c, 
and yet a is preferred to c 

From a logical standpoint, the first and most natural question to ask 
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about nontransitive indifTercnce relations concerns suitable axiomatic 
aeneraUiations ot simple orderings This was the concern in Luce (1956). 
where the concept of a semiorder was introduced as a natural generaliza- 
tion of the familiar concept of a simple ordering Luce originally stated 
the axioms in terms of two binary relations, one of preference and one of 
indifference. In Scott and Suppes (1958), the axioms were simplified and 
only a single binary relation of preference was used Their axioms arc 
stated in the following definition 

Definition 12 A semiorder is a binary relation P on a set A that satisfies 
the following three axioms for all a, b, c, and d in A ‘ 

I not aPa, 

1 if aPb and cPd, then either aPd or cPb, 

3 if aPb and bPe^ then either aPd or dPc. 

In Luce (1956) a jnd function is introduced which vanes with the 
individual elements of A, that is, the jnd function is defined on A, 
Intuitively it would be desirable if Luce’s results were the best possible 
for semiorders Unfortunately, they may be strengthened (see Scott and 
Suppes, 1958) to show that a numencal interpretation of P can be found 
which has as a consequence that the jnd function is constant for all 
elements of A In particular, the following theorem may be proved 
(see p 32, Chapter 1, Vol I for the procQ 

Theorem 10 Let the binary relation P be a semiorder on the finite set A. 
Then there exists a reaUvalued function uon A such that for every a and 
b in A 

aPb if and only if u{d) > u{b) + 1. 

It should be noted that this theorem is restricted to finite sets of alter- 
natives As IS evident from our earlier discussion, additional axioms are 
needed in order to prove such a representation theorem for infinite sets 
As fat as we know, no substantial investigation of semiorders for infinite 
sets has been made 

Matters become much more complicated if we admit subliminal differ- 
ences and at the same time attempt to obtain a numerical represeotatiou 
stronger than an ordinal scale Such a set of axioms was given by Gerlach 
(1957). To obtain more powerful numencal results she followed a proposal 
that originated with Wiener (1921) and introduced a four-place relation 
that has the following interpretation The relation abLcd holds whenever 
the subjective difference between a and b is algebraically sufficiently greater 
th^ the subjective difference between c and d Gerlach’ s axioms on L are 
sufficiently strong to permit her to prove that there is a real-valued function 
u defined on the set of alternatives and a jnd measure A (that is, A is a real 
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number) such that abLcd holds if and only if either 

(i) lw(c) — u{ci)\ < A and u(a) ~ u(b) ^ A, or 

(u) lu(a) — u(6)j < A and u{d) — «(c) ^ A, or 

(m) l«(o) - ii(A)| ^ A and |w(c) - u{d)\ ^ A and 

lu(a) - n(6)] - [n(c) - «(d)) ^ A 

Note that Condition (i) corresponds to the case when c and d are separated 
by less than a jnd, and a and b arc separated by at least a jnd. Condition 
(ii) reverses these relations, and Condition (in) covers the case when a 
and b as well as c and d are separated by at least a jnd Moreover, Mrs 
Gerlach proved that the numencai function u is unique up to a linear 
transformation, with, of course, the just noticeable difference A being 
transformed not by the entire linear transformation, but only by the 
multiplicative part of the transformation We do not give here the rather 
complicated axioms formulated by Mrs Gerlach 
A recent extensive study of jnd structures, building on the earlier work 
of Luce (1956), can be found in Adams (1963) Roughly speaking, Adams 
added to Luce’s ordinal concept of a semiorder an operation of combina- 
tion, with particular reference to weighing with a balance Although most 
of his detailed results are restricted to this additive case, his methods of 
attack are of more general interest and could be applied to the higher 
ordered metrics discussed in Sec 2 4 Adams showed that no matter how 
insensitive a balance is, it is always possible, on the basis of a finite number 
of compansons on the pans of the balance, to determine the weight of any 
object to an arbitrary degree of accuracy Of course, for a fixed set of 
objects and a fixed number of observations the accuracy has a fixed limit 
The pnmaiy limitation m applying his methods to psychological studies 
of preference is the absence of any probabilistic considerations of the sort 
surveyed in Secs 5 to 8 of this chapter 


3 ALGEBRAIC CHOICE THEORIES FOR 
UNCERTAIN OUTCOMES 

3 1 The Expected Utility Hypothesis 

The ordinal theory of Pareto, which dominated the economic theory of 
utility from the beginning of this century until the publication of von 
Neumann and Morgenstern’s treatise on the theory of games in 1944, 
rested squarely on the assumption that the individu^ in choosing among 
alternatives has no uncertainty about the consequences of these alterna- 
tives Once uncertainty m the consequences is admitted, no ordinal 
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theory of choice can be satisfactory The simplest sorts of examples 
suffice to make this fact clear Consider an individual who has five dollars 
that he is thinking about betting on the outcome of a throw of a die 
Suppose that if he makes the bet and the die comes up either one or three, 
he will receive nine dollars, whereas, if the die comes up two, four, five, 
or SIX, he will lose his initial five dollars His problem is to decide whether 
to bet the five dollars or not Granting the ordinal assumption that the 
individual surely prefers mote money to less — m this case, that he prefers 
nine to five to zero dollars— does not carry the individual very far in 
deciding whether or not to place the bet Only slight rcfiection makes it 
clear that this rather artificial example of deciding whether or not to place 
a bet IS but one of many examples, some of which are quite serious, in 
which decisions must be made in the face of inherently risky outcomes One 
of the most common examples, one that almost all (middle and upper class) 
individuals in our society face, is the variety of decisions about what and 
how much insurance to carry A typical middle-class member of our 
society now has insurance coverage for automobile collision and liability, 
his own death, destruction of his house by fire, loss of possessions by 
theft, medical and hospital insurance, and possibly accident and income 
insurance Every decision to invest in such an insurance policy involves a 
choice mthe face of uncertain outcomes, for the individual does not know 


what the present state of affairs will lead to in the way of future conse- 
quences for him Insurance is a way of taking a decision against suffering 
a financial disaster, either to himself or to his family, in case a personal 
catastrophe does occur 

The expected utility hypothesis, which probably was first clearly for- 
mulated by Daniel Bernoulli (1738), is the most important approach that 
has yet been suggested for making decisions m the context of uncertain 


outcomes The fundamental idea is exceedingly simple The individual 
must make a decision from among several possible alternatives The 
possible decisions may have a variety of consequences, and ordinarily the 
consequences are not simply determined by the decision taken but are 
also affected by the present state of affairs (also called the state of nature) 
We suppose that the subject has a utility function on the possible conse- 
quences and that he has a probability function on the possible states of 
nature According to the expected utility hypothesis, the wise decision 
maker selects a decision or course of action that maximizes his expectation. 

It IS perhaps useful to illustrate this important complex of ideas by 
considering a simple example such as the decision of whether or not to go 
to a football game in uncertain weather Let the set S of states of nature 
have as members the two possible states of raining, r., or not raining, 
during the game Let the set C of possible consequences be those of being 
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at the game and not being rained on, c^, staying home, Cj, and being at the 
game and being rained on, The two decisions are going to the game, 
di, and not going to the game, Formally, and dg are functions from 
S to C such that = Cj, di{s^ = c^, d^(s^ = <^ 2 (^ 2 ) = t'z Suppose now 
that the individual assigns a subjective probability of J to Sj, and f to Sg. 
and that he prefers consequence to to It should be evident, as we 
remarked earlier, that such merely ordinal preferences are insufficient to 
lead to a rational decision between d^ and d^ Let him, however, also 
assign numerical values to the consequences, in particular, let his utility 
function u be such that uicx) = 12, u(c^ = 6, «(c 3 ) *= —9 (and we suppose 
u IS unique up to a choice of unit and zero) Then the expected utility 
hypothesis invokes him to compute the expectation in the ordinary sense 
of random variables for both d, and d^, using the numerical utility function 
to define the values of the random variables, and then to choose the de- 
cision that has the greater expectation He finds that 

- K-9) + 1(12) « 5 

m) = 1(6) + 1(6) « 6, 

and so he should elect not to go to the game, d^ 

A central problem for normative or descriptive behavior theory is to 
state axioms on behavior that lead to a numerical representation of utility 
and probability so that decisions are based on a maximization of expected 
utility In the next subsection, 3 2, we consider axiom systems of this sort 
which assume in their statement the existence of numerical probabilities 
In Sec 3 3 we widen the context by considering axiom systems that also 
impose behavioral assumptions on probability, that, in essence, treat 
probability as a subjective degree of belief In Sec 3 4 we survey briefly 
various decision principles that have been proposed as alternatives to the 
expected utility hypothesis 

Before we turn to detailed analyses of behavioral assumptions that yield 
the expected utility hypothesis, il is natural to ask just how much beyond 
ordinal requirements are needed to sustain the expected utility hypothesis 
analysis From the remarks made in Sec 2 4 about representation prob- 
lems for higher ordered metrics, it should be evident that the situation is 
not simple Suppose, for simplicity at the moment, that we have only a 
finite number of states of nature We denote by the numerical probability 
(objective or subjective) of state / and by the numerical utility of the 
consequence that results from taking decision d when stale / is the true state 
of nature We then express the notion that the expectation of decision d 
IS greater than that of decision e by the following inequality 
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It IS Simple enough to construct counterexamples to show that probability 
and utility must be measured on scales stronger than ordinal ones m 
order to preserve this inequality under admissible transformations On 
the other hand, when the set of states of nature is finite and no additional 
assumptions are made, we cannot prove that for given decisions d and c 
utility must be measured on an interval scale, even if wc assume that 
numerical probability measures are already given. For present purposes, 
there are really not any natural groups of transformations lying between 
the group of monotone increasing transformations (ordinal scales) and 
the group of positive linear transformations (interval scales) that permit a 
precise analysis of the uniqueness aspect of the expected utility hypothesis 
Consequently, as we shall see, the various axiom systems developed to 
represent the expected utility hypothesis end up with the (unnecessarily 
restrictive) result that utility is measured on an interval scale This is done, 
of course, by imposing rather strong structural conditions of an existential 
character on either the set of states of nature or on the set of consequences, 
for quite strong assumptions are necessary in order to show that utility 
must be measured on an interval scale 


3 2 Axiom Systems That Assume Numerical Probability 

In this title we have deliberately said “numerical probability” instead of 
“objective probability,” because the axiom systems we shall consider can 
be interpreted in terms of subjective probability provided that we assume a 
prior numerical measurement of subjective probability that satisfies the 
ordinary probability axioms The scheme for measuring utility by using 
numerical probability originates with von Neumann and Morgenstern 
(1944\ BeJwaise, of \Vi. VnsVorwal and its essenlia\ 

simplicity, we state with only trivial modifications the original von 
Neumann-Morgenstern axiom system Numerical probabilities enter into 
the statement of the axioms m the following essential way If we have two 
alternatives, z and y, then we may consider the new alternative consisting 
of alternative x with probability a and alternative y with probability 
1 — a We denote this new alternative by xat/, which expression is often 
called the a mixture of x and y Once such mixtures are available, we may 
ask the subject for his preferences not only among pure alternatives but 
also among probability mixtures of alternatives Thus we might ask if he 
prefers the pure alternative z to an OL mixture of x and y, that >s, to receiving 
X with probability x and tj with probability ) _ a It is important to 
reahre, of course, that x and y arc not numbers and that the juxtaposition 
does not signify multiplication The expression denotes a single 
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ternary operation, say /i, and postulates are then required to express its 
properties To make this point quite explicit, we could write m place of 
“a:ay” a more explicit functional notation such as/i(x, a, y) In terms of this 
notation, the following Axioms 3 and 4 would be expressed as 

/i(x, a, y) =s /i(f/, 1 — a, x), 
h[h(x^ a. y), y] = hix, a/?, y) 

The von Neumann-Morgenstern axioms are incorporated m the follow- 
ing definition 

Definition 13 A iriple = (A, Ryh} is a von Neumann-Morgenstern 
system of utility ij" and only if the following axioms are satisfied for every 
a:, y and z in A and every a and ^ in the open interval (0, 1), where the 
operation h is denoted by juxtaposition 

1 R IS a weak ordering of A, 

2 X7.y IS in A 

3 a:ay » y{\ — a:)x, 

4 (xxy)§y * x<x^y. 

5 if xly, then zazlyaz, 

6 if xPy, then xPxtxy and xa.yPy, 

7 if xPy and yPz, then there is a y in (0, 1) suck that yPxyz 

8 if xPy and yPz, then there ts ay in (0, 1) such that zyzPy 

Axiom 1 just requires the familiar ordinal restriction that A be a weak 
ordering of A The remaining axioms together impose much stronger 
requirements, which are difficult to satisfy exactly m practice This 
difficulty, has in fact, been much of the source of inspiration for the subse- 
quent developments considered in Secs 5 to 8 The second axiom is simply 
a closure axiom requiring that if two alternatives are m the set A, then 
any probability mixture of them is also in the set of alternatives It 
follows from this postulate, from the existence of a, be A such that 
aPb, and from Axiom 1 that the set A is infinite (In discussions of this 
sort we always implicitly require that A be nonempty and, here, we also 
require that there be at least two alternatives, one of which is stnctly 
preferred to another ) The third and fourth axioms state simple assump- 
tions about mixtures and their combinations In more recent discussions, 
the properties expressed by Axioms 2, 3, and 4 are central to the character- 
ization of what are called mixture spaces (see discussion later of the 
Herstem and Milnor axiomatizalion of utility) Notice that both Axioms 
3 and 4 are trivially true if we interpret x and y as numbers The fifth 
axiom asserts that if an individual is indifferent between x and y, then he 
IS also indifferent between any probability mixture of z and z and the same 
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mixture of y and e The sixth axiom asserts that if alternative x is strictly 
preferred to alternative y, then x is strictly preferred to any probability 
mixture of x and ij, and any probability mixture of i and y is strictly 
preferred to y The last two axioms state what amount to a continuity 
condition on preferences If y is between x and x in strict preference, then 
Axiom 7 asserts that there is a probability mixture of x and x that is 
preferred to y, and Axiom 8 asserts that there is a probability mixture of 
X and 3 to which y is preferred 

The proof that these eight axioms arc sufficient to guarantee the existence 
of a numerical utility function, unique up to a linear transformation, is 
rather tedious and is not given here Readers are referred to the quite 
detailed proof m von Neumann and Morgenstern (1947) The precise 
theorem they established is the following 

Theorem 11 Let — {A, R, h) be a non Neumann-Morgeiistern system 
of utility Then there exists a real-valued function u defined on A such 
that for every z and y in A and a m (0, 1) 


(i) zRy if and only if «(a:) ^ «(y), 

(ii) u(xay) =» au(2) + (i — a)u(y) 


Moreover, if u' is any other function satisfying (i) and (ii), then m' is 
related to u by a positive linear transformation 

The theorem just stated shows that the axioms arc sufficient for the 
lepTcsentation, it is also interesting to observe that they are necessary as 
well, once the concept of a mixture space is introduced That is, once we 
require Axioms 2, 3, and 4, then the remaining axioms express necessary 
conditions as well as sufficient ones on any numerical utility function 
having the properties expressed in the theorem 

Since the publication of the von Neumann and Morgenstern axiomatiza- 
tion, there have been a number of modifications suggested in the literature 
and, more important, considerable simplifications of the proof originally 
given by von Neumann and Morgenstern The intensive examination of 


this axiomalizalion is undoubtedly due to the considerable importance 
that has been attached to it by economists and also by mathematical 
statisticians One of the first searching examinations was by Marschak 
(1950) Perhaps the most discussed subsequent axiomatization is that 
given by Herstein and Milnor (1953) The notation of Herslein and Milnor 
has been changed slightly to conform as much as possible to that used 
earlier m defining a von Neumann-Morgenstetn system of utihtv Their 
axioms are given in the following definition 

Defmitior, H A tnpic si = (A, R,h) is a Herstem-Milnor system of 
Utility if and only if the following axioms are satisfied for every x, y, andz 
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m A and every a and ^ m the closed interval [0. 1], H/i^re the operation 
h IS denoted by juxtaposition 

1 H IS a weak ordering of A, 

2 xa.y IS in A, 

3 xay = y{} — u)x, 

4 {xa.y)^y = xa^y, 

5 x\y = X, 

6 if xly, then x\zly\z^ 

7 the sets {a j xctyR^ and {a | zRxxy} are closed 

The first four axioms of Herstem and Milnor, as formulated here, are 
precisely the same as the first four axioms of von Neumann and Morgen- 
stern given earlier Herstem and Milnor’s fifth axiom differs because they 
permit probabilities to he in the closed interval [0, 1], rather than just in 
the open interval (0, 1) A set A and a ternary operation satisfying Mioms 
2 to 5 are said to constitute a mixture-space The sixth axiom of Herstem 
and Milnor is just a weakening of von Neumann and Morgenstern’s 
Axiom 5 from their sixth axiom one can prove Axiom 5 of the earlier 
definition Finally, their seventh axiom, formulated in terms of closed 
sets, just imposes a condition of continuity, and it has essentially the same 
consequences as Axioms 7 and 8 of the earlier definition It js pretty much 
a matter of mathematical taste whether one prefers the kind of elementary 
continuity formulation contained in the earlier definition or the very simple 
topological assumption made in Axiom 7 by Herstem and Milnor Prob- 
ably the mam advantage of the topological version is that the resulting 
proof of the existence of a numencal utility function is more elegant 
Although we shall not give the complete Herstem-Milnor proof that 
there exists a numerical utility function for any system satisfying their 
axioms, we list their senes of theorems and for the conceptually more 
important ones indicate the nature of the proof 

Theorem 12 If y, and z are m A and xRy and yRz, then there exists a 
probability a such that ylxxz 

PROOF Let r ** {a I xxzRy) By virtue of Axiom 7, T is a closed 
subset of the unit interval [0, IJ Moreover, since xRy by hypothesis 
and x-le « x by Axiom 5, 1 is m Tand thus Tis not empty Similarly, 

W -{f\ yRx^z) IS closed and nonempty By virtue of Axioms I and 2, 
every probability, that is, every number in the closed inicrval [0, 1}, is in 
cither Tor IK and therefore TU {0, ij Since the unit intcnaf is 
topologically connected, that is, it cannot be decomposed into a union of 
closed, disjoint sets, it follows that T A JFis not empty, any element a m 
the mlcrsccuon satisfies the requirement of the theorem 
Theorem 13 If x and y are m A anJxIy^ then xzzJyzz 
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Note that Theorem 13 is Axiom 5 of Dcf. 13 This theorem shows that 
the weakened Axiom 6 of Def. 14 is adequate. 

Theorem 14 If xPy and i/O < a < 1, then xxyPy. 

This theorem is just the second part of Axiom 6 of Dcf. 13. 

Theorem IS IfxPy, then xayPx^y if and only if a.> ^ 

Theorem 16 All alternatives in A are indifferent or there are an infinite 
number of aUematwes differing m preference. 

PROOF. If for some x and y in /I, xPy, then by Theorem 15 and the 
density property of the real numbers there is an infinite number of 
preferentially distinct alternatives. 

Theorem 12 is next strengthened to a uniqueness condition 
Theorem 17 If xPy and yPz, then there is a unique a such that ylxxs. 
PROOF Use Theorem 15. 

The following elementary theorem is not explicitly stated by Herstem 
and Milnor, but it is useful m establishing the “linearity” of the utility 
function, that is, property (ii) of Theorem 1 1 
Theorem 18 ixPy)ix(xyy) = x[(x^ + (1 — a)y]y. 


PROOF + (1 - + (1 - a)jy!; 

= j“ + (1 — c<)Jy (by Axiom 4) 
= ^ — 1 4 - (by Axiom 3) 

= - ^(^y) 

~ Axiom 43 

= [(j’yy) ^ y jx(a;yy) (by Axiom 3) 

= (xPyMxyy) (by Axiom 4) 


Theorem 19 Suppose xPy and define S„ = {e ] Lei f and e 

md7 m r’ /“"«'<"« defined on S., such that for r„ 

fi:dZcii7::hXc 

Z7or s Ini7nT^" ”■* ™ S„,s just to say 


/(•'«) = «/(s) + (I - a)/(0 
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and 

sRt if and only if f(s) ^/(O- 

First, suppose r^^RzKr^. Then zlr^ciTf^ for some a by virtue of Theorem 12. 
Thus 

/(z) = a/(ri) + (I - a)/(ra) 

= + (1 - afeW (by hypothesis) 

= (by Imeanty) 

= g(2)- 

Suppose now that zPr^. Then again by virtue of Theorem 12, for some 
a, Tihtjja. Therefore 

/(ri) = "t/(z) + (1 - 
= gCa) 

= «<g(2) + (I - a)g(ro), 

and since ai’ri, a > 0 and therefore f(z) = g(e). A similar proof holds 
for the remaining case of r^Pz. 

Theorem 20. There exists a linear, order-preserang utility fwictton on A, 
and this utility function is unique up to a positiie linear traiisformalwn. 
PROOF. Suppose xPy and define as in Theorem 19 

■S'™ = {* I xPzRy). 

By virtue of Theorem 17, for any z in S„ there is a unique e„(e) such that 
zlxu„(z)y. 

We choose now ra and r, with rjPr„, and we keep them fixed for the rest 
of the proof. We consider only pairs x, y such that 
r.» rx G 

and for any z e S„, we define 

, , _ «.r.(z) - g„(r») 
a„(r,)-a„(r.)’ 

Now by virtue of Theorem 15 , x„(z) > <z„(u) if and only ifz/>it. and thus 
w(z) > u(n) it and only it xPn. 

The hneanty of u is established by the following argument. Consider 
zdii. Then by Theorem 17 

z/«(e)y 

»/rx(u)y. 


and therefore 


e^B;[xx{e)yyfxa(a)y], 
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and thus by Thsotem 18 

t/3i./i[/3«(s) + (1 - /)):t(i>)b 

Now by definition, 


_ rga(a^ + (1 - - a(rft) 

a(ri) — a(ro) 

== u(a:[|3a(2) + (I - |?)3c(w)]y} (by definition) 

=a u[z^w] (t>y above equivalence) 

Obviously u(ri) == 1 and u(r(,) = 0, and m view of Theorem 19, the 
numerical value of u for any z e S„ is independent of * and y Thus « 
may be extended without ambiguity or inconsistency over the whole of A. 

The proof that u is unique up to a positive linear transformation we leave 
as an exercise 

Generalizations of the von. Neumann-Morgenstern approach to utility 
have been made m several directions Hausner (1954) dropped the 
Archimedean properties, expressed by Axioms 7 and 8 of Def 13, and he 
obtained a representation in terms of vectors rather than real numbers 
(His axioms are not precisely the remaining Axioms, I to 6, of Def 13, 
but they are essentially equivalent ) 

A different generalization has been pursued by Aumann (1962), who 
mvestig^ited the consequences of dropping the connectivity axiom which 
demands that any two alternatives be comparable m preference, that is, 
that one must be weakly preferred to the other Aumann weakened Eq 1 
of Sec 2 I to the implication 

if xRy^ then u(x) ^ w( 2 /) (5) 

In addition to the axioms on a mixture space (Axioms 2 to 5 of Def 14), 
he postulated that 

(i) R IS transitive and reflexive on the mixture space, 

(n) if 0 < a < 1, then xRy if and only if, for all z, xazRycnz, 

(ill) if xayPz for all a > 0, then not zPy 

Without assuming connectivity he proved that there exists a linear utility 
function satisfying Implication 5 

An earlier result of the same sort, but m the context of a finite set of 
alternatives for which subjective probability is also derived from behavioral 
assumptions, was given by Davidson, Suppes, and Siegel (1957, Chapter 4) 
Still a third sort of generalization has been pursued by Pfanzagl (1959a, 
b) He introduced the general concept of a “metric” operation in an 
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ordered set and developed the consequences of having two such operations 
Roughly speaking, a binary operaUon o xs a metric operation in Pfanzagl’s 
sense if it is monotonic, continuous, and bisymmetric in the ordering 
relation [it is bisymmetric if {xoy)o(zow)I(xf>z)o(t/ow)] Moreover, he 
showed that two metric operations « and * on the same ordered set lead 
to scales identical up to linear transformations if the isometry-relation 

(xoy)^(zow)f(x*z)o(y^w) 

holds for any elements x, y, a and w in the set The mixture space operation 
xcty is a special case of his metric operation, and two distinct probabilities 
a and ^ satisfy the isometry-relation, that is, 

(xaiy)^(za.w)f(x^z)a.(y^w) 

Among other things, to construct a utility scale unique up to a linear 
transformation it is sufficient to consider only mixtures with a fixed 
probability a The reader is referred to Pfanzagl s monograph (1959a) 
for full details on these matters 


3 3 Axiom Systems for Subjective Probability 

In this subsection we consider the most important direction in which 
the von Neumann-Morgcnstern approach has been generalized, namely, 
the derivation of a numerical subjective probability function, as well as a 
utility function, from qualitative postulates on preferences among choices. 
In many situations in which an individual must decide or choose among 
alternatives, there is no quantitative measure of probability at hand, 
ready to be used to evaluate the expected utility of the various alternatives 
Thus It IS natural to try to extend the axiom systems considered m the 
previous section to include axioms on decisions that yield measures of 
both utility and probability, thereby extending the domain of applicability 
of the expected utility hypothesis Because it is intended that such enlarged 
axiom systems should apply when a relative frequency characterization of 
probability is either not available or meaningful, it is customary in the 
literature to refer to the probability that is derived as subjecttie probability 
in contrast to the objective probability defined in terms of relative fre- 
quency We shall say more about the subjective character of probability 
as we consider various proposals that have been made 

A little reflection on the problem of jointly axiomauzing structural 
conditions that arc sufficient to yield measurement of both utility and 
subjccuve probability suggests two different way» to proceed One is to 
to stale axioms in such a way that we obtain first a measure of 
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ulihty, which is then used to obtain a measure of subjective probability. 
The other approach proceeds in the reverse order we state axioms that 
permit us first to obtain a measure of subjective prob.ibility, which is then 
used to measure utility along the line of argument described in the pre- 
ceding subsection The earliest approach (Ramsey, 1931)' followed the 
first tack, that is, utility is measured first Ramsey's essential idea was to 
find a chance event with subjective probability of one-half, then to use 
this event to determine the utilities of outcomes, and, finally, to apply the 
constructed utility function to measure the subjective probabilities of the 
states of nature 

Because this approach has been used extensively in choice experiments 
by a number of people (its experimental application originates with 
Davidson, Suppes, & Siegel, 1957), we describe it m greater detail. The 
first thing to make clear is that one can determine if a chance event has 
subjective probability i without first having a quantitative measure of 
probability Suppose a subject is choosing between the two options shown 
m the following matrix 

Option I Option 2 



If the subject chooses Option 1 and if event £ happens, then he receives 
outcome a If, however, the complementary event £ happens, he receives 
outcome b On the other hand, if he chooses Option 2 and event £ occurs, 
then he receives outcome c, whereas, if E occurs, he receives d If the 
subject chooses Option 1 over Option 2, and if we had a subjective proba- 
bility function s and a ulihty function u with the (subjective) expected 
utility property, then we would express his choice by the foJJowjjig jnequaJ- 
ity 

j(£)u(a) -b j(£)h( 6) ^ s(£)u(c) + s{E)u{d) (6) 

If the subject were indifferent between the options, the inequality would 
become the following equality 

i(£)uCd) + siEHb) = s(E)u(c) + j(£)w(d) (7) 

The notion of indifference is critical to Ramsey’s theory 
w= now attempt to find an £• such that for eveiy pan of alternatives a 
and b we have 


i(£*)»Ca) + siE*)u(b) = s(E*)u{b) + 


( 8 ) 


*Inactualfact Ramsey s two cssayson this matter were written in 1926 anH iq2R hut 

they were not published until after his death m 1931 ^ ^ 
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We shall not go more deeply into the formal aspects of the Ramsey 
approach, partly because the axioms are very similar m spirit to those 
discussed m Sec 2 4 dealing with utility dilTcrcnccs Readers arc referred 
to Davidson and Suppes (1956) and Suppes <1956) 

The approach that begins with a consideration of probability rather 
than utility originated with dc Finctti, a comprehensive statement of this 
viewpoint is found in de Finctti (1937), which includes extensive references 
to his earlier works on the foundations of probability The most important 
recent work on these matters is Savage’s (1954) book, which extended dc 
Finetti’s ideas, in particular by paying greater attention to the behavioral 
aspects of decisions, although from the standpoint of experimental 
psychology Savage’s approach is still very far from a thorough-going 
behavioral one Six relevant articles, including an English translation of 
de Finetti (1937), have been reprinted in Kyburg and Smokier (1964) 

We do not discuss here the many important ideas of de Finetti and 
Savage concerning the foundations of probability and statistics, but we do 
want to emphasize those ideas that seem especially important for the 
psychological theory of preference and choice 
Perhaps the best place to begin is with de Finetti’s axioms for qualitative 
probability In spirit, these axiomsare similar to those we considered earlier 
for higher ordered metrics Suppose we ask our subject to tell us for a 
variety of pairs of events which of each pair he believes to be the more 
probable The question then arises how complicated must the conditions 
be on the qualitative relation more probable than m order to obtain a 
numerical probability measure over events’’ 

The question to be asked of our subject has been formulated in a non- 
behavioral fashion, however, it is easy to give a behavioral method that 
leads to the kinds of responses we wish to obtain Suppose we want 
behavioral evidence as to whether the subject thinks event E is more or 
less probable than event F Consider two options with outcomes a, 6, or c 
and assume that a is preferred to 6 For example, outcome a might be 
winning five cents, b losing five cents, and c winning or losing nothing 
The matrix of the game we present him has the following form 


Option 1 Option 2 
E ' a b ' 

F b a 

c c 


where £ u F is the event that neither E nor F occurs 
events ate exhaustive, although not necessarily mutually 


Thus our three 
exclusive 
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Suppose that the subject chooses Option 1 Under the hypothesis that 
he IS maximizing subjective utility we have the following inequality 

5(£)«(a) + s{F)u(b) + 

> s{E)u(b) + s(F)u{a) + (1 1) 

On the assumption that a is preferred to b, that is that t((a) > u(b), it 
follows immediately from Eq 1 1 that 

siE}>siF) 

Thus from behavioral data on choices between options and the hypothesis 
that subjects are choosing so as to maximize expected utility, we obtain 
inequalities on subjective probability that are not based on introspective 
data 

A discussion of experiments whose objective has been the measurement 
of subjective probability is reserved until Sec 4 3 

Corresponding to the analysis of various higher ordered metrics in 
Sec 2 4 it IS natural to ask what formal requirements must be placed on 
the qualitative relation more probable than in order to guarantee the 
existence of a probability measure that reflects the order structure of the 
relation 

Let ^ be the relation of (neaklp) more probable than For the formal 
statement of the axioms, it is convenient to assume that the relation ^ 
holds between events that are subsets of a given sample space X In 
other words we use the usual set theorelicaJ notions for representing 
events 

Definition 15 /l pair (A^ « a qualitative probability structure //“f/ie 

follou ifig axioms are satisfied for all subsets A, B and C of X 

1 if d. "2: B and B then A ^ C 

2 A'^B or B'^A 

2 if Ar\C=^4> and BnC^^ then A B if and only if 
AkjC'^B^C 

4 

5 not 4> ^ X 

The first two axioms just assert that ^ is a weak ordering of the subsets 
of X The third axiom formulates in qualitative terms the important and 
essential principle of additivity of mutually exclusive events The fourth 
axiom says that any event is (weakly) more probable than the impossible 
c\ent and the fifth that the certain event is strictly more probable than the 
impossible event Defining the strict relation > in the customary fashion, 

A>B if and only if not 
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We shall not go more deeply into the formal aspects of the Ramsey 
approach, partly because the axioms arc very similar m spirit to those 
discussed m Sec 2 4 dealmg sviih uViUly differences Readers are referred 
to Davidson and Suppes (1956) and Suppes (1956) 

The approach that begins with a consideration of probability rather 
than utility originated with de Fmetti, a comprehensive statement of this 
viewpoint IS found in de Fmetti (1937), which includes extensive references 
to his earlier works on the foundations of probability The most important 
recent work on these matters is Savage’s (1954) book, which extended de 
Fmetti’s ideas, in particular by paying greater attention to the behavioral 
aspects of decisions, although from the standpoint of experimental 
psychology Savage’s approach is still very far from a thorough-going 
behavioral one Six relevant articles, including an English translation of 
de Fmetti (1937), have been reprinted in Kyburg and Smokier (1964) 

We do not discuss here the many important ideas of de Fmetti and 
Savage concerning the foundations of probability and statistics, but we do 
want to emphasize those ideas that seem especially important for the 
psychological theory of preference and choice 
Perhaps the best place to begin is with de Finetti’s axioms for qualitative 
probability In spirit, these axiomsaresimilarlo those weconsidcrcd earlier 
for higher ordered metrics Suppose we ask our subject to tell us for a 
variety of pairs of events which of each pair he believes to be the more 
probable The question then arises how complicated must the conditions 
be on the qualitative relation more pro6a6/e than in order to obtain a 
numerical probability measure over events? 

The question to be asked of our subject has been formulated m a non- 
behavioral fashion, however, it is easy to give a behavioral method that 
leads to the kinds of responses we wish to obtain Suppose we want 
behavioral evidence as to whether the subject thinks event E is more or 
less probable than event F Consider two options with outcomes a, b, or c 
and assume that a is preferred to b For example, outcome a might be 
winning five cents, b losing five cents, and c winning or losing nothing 
The matrix of the game we present him has the following form 


Option 1 Option 2 
E V <7 b 

F b a 

FufI c c 


where £ u F is the event that neither E nor F occurs 
events arc exhaustrve, although not necessarily mutually 


Thus our three 
exclusive 
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follows immediately from Eq H that 

s(E) ^ s(F) 

Thus from behavioral data on °xpe°"e/m^ we obtain 
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of subjective probability is reserve ordered metrics in 

Corresponding to the analysis f ^^ requireLnts must be placed on 
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the qualitative relation more pr structure of the 

existence of a probability measure that renects 

relation --tM more probable than For the formal 

Let ^ be the relation , fo assume that the relation ^ 

Statement of the axioms, it is ^ sample space X In 

holds between events that a atical notions for representing 

other words, we use the usual sen 

events (Y >) is a qualitative probability structure if I :e 

Definition 15 ^ siibsels A. B. and C of X 

foIloHing axioms are salispe j 

1 ,rA^BandB^C,tl>enA^C. 

2 A^BorB^A. . A^B f and only f 

3 1/ AnC=-l> 

AuC^B^C. 

4 A ^iti. 

5 not ■!> -Si X ordering of the subsets 

The first two axioms just as^jW ^ ,ha .mp^ant and 

CY The third axiom formulat H The fourth 
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axiom says that any c'o" ^,„t is slrietly more proba 


-possible esen. ,f and only if no. A 


a>b 
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we may state the fifth axiom as A > ^ It ts a simple matter to mtejrel 
behaviorally each of tlie axioms in terms of the option scheme .ilrcacly 

described , . , ,t-_ 

To give a somewhat deeper sense of the structure imposed by tnc 
axioms, we state some of the mtuilwcly desirable and expected consequences 
of the axioms U is convenient m the statement of theorems to use t c 
(weakly) less probable relation, defined in the expected manner 

A if and only if B'^A 


The first theorem says that ^ is an extension of the subset relation 
Theorem 2l If A ^ B, then A B 

PROOF Suppose, on the contrary, that not A ^ B, that is, that A B 
By hypothesis A ^ B, &o there is a set C disjoint from A such that 
AkjC = B Then, because A u = /I, we have at once 


Au<f>~A>B^A\jC, 

and therefore by contraposition of Axiom 3, ^ > C, which contradicts 

Axiom 4 

Theorem 22 If <^ <, A and A Ct B ^ then B <. A B 

Theorem 23 If A B, then S'^A 

Theorem 24 If A B, C D,and A Cx C ^ 4>, then A kjC^BV D 

Theorem 25 If A'^B'^C'OD and C n Z> » then C or 

B'2:D 

Theorem 26 If and C ^ C, then B'^C 

Because it ts relatively easy to prove that a qualitative probability 
structure has many of the expected properties, as reflected in the preceding 
theorems it is natural to go on and ask the deeper question whether or 
not it has all of the properties necessary to guarantee the existence of a 
numerical probability measure P such that for any subsets A and B of X 


P{A) ^ P{B) if and only if /f ^ B (12) 


If X IS an infinite set, it is moderately easy to show that the axioms of 
Def 15 are not strong enough to guarantee the existence of such a proba- 
bility measure General arguments from the logical theory of models m 
terms of infinite models of arbitrary cardinality suffice, a particular 
counterexample is given in Savage (1954, p 41) de Finetli (1951) 
stressed the desirability of obtaining an answer in the finite case Kraft, 
Pratt, and Seidenberg (1959) showed that the answer is also negative when 
X IS finite, in fact, they found a counterexample for a set X having five 
elements (and thus, 32 subsets), but it is too complicated to present here 
It IS, of course, apparent that by adding special structural assumptions 
to the axioms of Def 15 it is possible to guarantee the existence of a 
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been found by Scott (1964) ( >7 formulated by 

existence of a probability measu j are difficult to understand 

Kraft et al , but their multiplicative co ^ simplicity ) The 

Scott’s formulation represents a rea g algebraic condition on 

central idea of Scott’s formula ion i Recall that the characteristic 

the characteristic functions of ^.^e ‘value 1 to elements of 

;r=;r;r;ff 

do not give . , W > a binary relation on the subsets 

Theorem 27 LetXbeafinie j,t,ons that there exist a probability 

of X Necessary and suffice^ cond ^ 

measure P on X sausfymg Eq 12 are J 
and B of X, 


1 B or B A, 

1 A-^^, 

3 X>4>^ 

4 for all subsets Ao^ 
0 < i < n, and 

Ao‘ 4- 

then A„< B„ 


yA„ 

+ Af 


S., .B,, of X ,f A.^B.for 

= Bf + Ef. 


To Illustrate the 
transitivity First, tor ay 


= B’(x) + C'(t) + 


that is, for all dements x 

+ 5'(x) + C'(x:) = - V-. . - - - 

r’ Oft therefore by virtue of Condition , 
By hypothesis. A ^ B aud B^ C, a , on 

C<A and thus, by definition, 2 oijmcnt of X, that is, any ati^ie 

Obviously, this a B „ ^presents quite a st g 

set language of D ,„ ihe sense of Sec 2 3 

It IS, however, an 
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Where X is infinite, a number of strong struclor.ll conditions ln\c been 
shown to be sufficient but not necessary For esamplc, de finetti (1937) 
and independently Koopman ()940a, 1940b, 1941) used an axiom to the 
effect that there exist partitions of A' into arbitrarily many events equivalent 
m probability This axiom, together with those of Def 1 5, is sufficient to 
prove the existence of a numeriCaat probability measure Other related 
conditions of a similar existential sort arc discussed in Savage (1954) 
Extending the methods used to prove Theorem 27, Scott has improved on 
these earlier results for the infinite case and has found properties that arc 
both necessary and sufficient, but we do not state his somewhat complicated 
conditions here 

Thus far we have not explicitly considered sets of axioms that character- 
ize utility and subjective probability together, although wc began this 
subsection with a discussion of the alternative approaches that originate 
With Ramsey and de Finetti We conclude with a sketch of the best known 
set of axioms, namely, those of Savage (1954) Savage has a single primitive 
telation of weak preference on decisions or acts, and decisions are 
themselves functions from the set S of stales of nature to the set A of 
consequences In terms of ^ it is fairly straightforward to define a corre- 
sponding relation of preference on consequences and one of comparative 
probability on the states of nature Assuming these two additional rela- 
tions, Savage’s seven postulates may be formulated verbally as follows 


1 The relation ^ is a weak ordering of the set D of decisions 

2 Given two decisions restricted to a subset of the states, then one is 
weakly preferred to the other 

3 Given two decisions, each of which has a constant outcome or 
consequence on a subset X of states of nature, then one decision is weakly 
preferred to the other given the set X if and only if the constant outcome 
of the first decision is weakly preferred to that of the second decision 

4 For any two sets of states of nature, X and Y, one is (weakly) more 
probable than the other, that is, X > V or T > X 

5 All consequences are not equally preferred 

6 If decision/is strictly preferred to g, then there is a partition of S so 
fine that if/ agrees with/ and g agrees with g, except on one element of 
the partition, then / is strictly preferred to g' and/ is strictly preferred 
tog 

7 If every possible consequence of the decision / is at least as attractive 
as the decision g considered as a whole, then/is weakly preferred to g 


Postulate 7 formulates the sure-thmg principle about 
have more to say m the next subsection 
Because the formal statement of these 


which we shall 


seven axioms requires several 
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rather complex defimttons and to avo.d overlong 

?rrrerd“ 

(1957) On the ba.s of the. a^oms Sava^e^^ 

the partition f^g 

if and only if 

2 s(7i'.)u[/(J(’.)) ^ 2 s(A'.)..[g(7l',)] 

From a behavioral standpoint ttemost -n-s^-akness Ws 
system, and all those similar ’ p„ 5 ,a„t outcome independent 

decisions, that is, those decisions t indeed for such decisions 

of the state of nature In actual practic 5,^,5 

to be realizable, each outcome cannot ^ „„hod of getting 

of nature Unfofl“"ately, there seems t j J^5S,on, the constant 

around their use I" Savage’s own^teehmc^al 

decisions are used both to defin niatters, see Suppes, 1956 ) 

on states (For additional discussion of these matter 


3 4 Other Decision Principles 


J 4 VJUICI " - 

L . ,-nl literature about decisions made 
A great deal of the modern principles other than 

m uncertain situations concentrated o 

the maximization of expected “ ''''y does not have informa- 

is the recognition that, m unrertam events that, in part, 

tion adequate to assign Ff “bd.tm ^ problem 

determine the outcome that random factors m the environ- 

when the uncertainty arises not only f ^^ „„ loade by 

ment but also from the more „i,ker is unable, or umulhng 

other people or °-'B“"'“‘'°"Lbdity of each event occurring, then he must 
to act as if he knows the probabil y maximization of expected 

;roke some weaker f-'-.S^Cncomplete information about the 
utility— some principle that depw formulation and mathe 

e" - r n 
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and Girstack (1954), Luce and Raiffa (1957), Ra.fTa and Schlaifcr (1961). 
and von Neumann and Morgcnslern (1944, 1947, 1953) 

Beyond any doubt, the simplest and least controversial principle ol 
mtional one that docs not presume any knowledge at all 

about the relative likelihoods of the rclcvcnv events — is the sure-thmg 
principle It asserts that if two strategics (that is, possible decisions or 
acts available to the decision maker) a and a are such that for each possible 
event (that is, stale of nature or strategics of other decision makers), the 
outcome that results from the choice of a is at least as desirable as the one 
that results from a and that for at least one event the outcome from a is 
strictly preferred to that from a\ then strategy a is better than strategy a . 
A rational person is assumed to choose a over a. 

The mam weakness of the sure-thing principle as a guide to, or description 
of, behavior is that it can so rarely be applied , in general, of two strategies 
neither is belter than the other, m the preceding sense To show, however, 
that It is not totally without force, consider the famous game known as the 
prisoner’s dilemma There are two players (as decision makers are called 
m game theory), A and B, each with two strategies, and the payoffs (m 
money or utility units, as the case may be) are given by 


Play ., ‘’•f ‘-'“’’'’^1 

a,l.(10,-10) (-1,-1)J, 


Player B 

b, 

(- 10 , 10)1 
- 10 ) (- 

where player A receives the first payoff in the cell selected and player B 
the second one Now, looking at the situation from /I’s point of view, if B 
chooses then clearly o, is better than oj since, by definition, 10 is 
preferred to 5, and equally well, if is chosen, is better than since 
—1 IS preferred to —10 Hence, by the sure-thmg principle, A should 
choose 02 An exactly parallel argument leads B to choose hg, and so the 
resulting payoff is (—1, —1) 

This IS an odd, somewhat disturbing result from a social standpoint 
since it is evident that the strategy choice < 0 ^, b^), which leads to the out- 
come (5, 5), IS preferable to both players Indeed, we can formulate a 
social version of the sure-thmg principle that dictates this choice A 
strategy pair (a, b) is better than the pair {a’, 6') provided that the outcome 
from {a, b) is at least as good for both players and is strictly preferred by 
one of them to the omcome from <n , b’) Any strategy pair such that no 
other pair is better in this sense is said to be Pareto optimal It is easy to 
see m the prisoner's dilemma that (a.. 6.) is Pareto optimal A rational 
social decision principle should, it is generally agreed, result m Pareto 
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optimal strategies (This notion can be generalized in the obvious way to 
games involving three or more players, we do not spell out the details ) 
The prisoner’s dilemma makes it quite clear that the sure*thing principle 
for individuals is not generally consistent with the sure-tbing principle 
(Pareto optimality) for social conflicts Since it is not usually possible to 
enforce social principles without a good deal of machinery beyond the 
decision situation itself and since the existing empirical data (Deutsch, 
1958, 1960, Lutzker, 1960, Rapoport, 1963, pp 558-561, Scodel, 
Minas, Ratoosh, & Lipetz, 1959) strongly suggests that under normal 
circumstances the individual principle overrides the social one, it seems 
appropriate to attempt to generalize the individual sure-thing principle 
to cover decision situations to which it cannot itself be applied 

One idea is to search for strategies, one for each player, such that no 
person acting alone can benefit himself by a change of strategy We make 
this precise for two players, the generalization to any number is obvious 
If (a, b) IS a pair of strategies, let M,(a, b) denote the corresponding outcome 
to player i The strategy pair <<r, b) is said to be m equilibrium if for any 
other strategy a* of player A, 

^ *)• 

and if for any other strategy b' of player B, 

In the prisoner’s dilemma, (oj, b^) is in equilibrium In the game 

bi bs 

«ir (2,1) (-i,-i)i 

^^LC-i, -1) (1.2) j. 

the sure-thing principle is powerless, but jt is easy to see that both {oi, bi) 
and (fl 2 , 62 ) are in equilibrium Thus the notion of an equilibrium pair 
extends the class of games to which considerations of individual rationality 
can be applied Nevertheless, it has two failings First, as the preceding 
example shows, it does not tell an individual player what to do to get one 
of the desired outcomes if player A were to choose n, in an attempt to 
get the (2, 1) outcome and player B were to choose in an attempt to 
get ( 1 , 2 ), then the nonequilibnum pair (a,, 62 ) would result, and the players 
would receive the nondesired outcome ( — I, —1) Second, the equilibrium 
notion IS incapable of resolving all games since, for example, 

m, -1) (-1,1)1 

L(-l,l) (l.-DJ 



^02 UTt.iTV, AND sm.j.oT.vP ri<onAni..Ty 

has no pa.r of stratcg.es that arc m cqu.l.bntim, as is easily verified. We 

take up the second problem first 

The major resolution that has been suggested docs not involve a further 
weakening of the decision principle, but rather the given discrete game is 
extended to a continuous one by enriching greatly the strategies available 
to each player von Neumann (1928, sccalso von Neumann and Morgen- 
stern, 1944, 1947, 1953) pointed out that whenever a set of strategics is 
available, then so are all the probability distributions over it A player can 
generate any distribution he chooses toy using suitable nuxihary devices 
Thus, if his preferences satisfy the axioms for expected utility when the 
probabilities are known, as they arc when he generates them himself, then 
the expected payoff can be calculated for each probability distribution over 
the given strategies, that is, for each mixed strategy, as such a distribution 
over strategies is caWed This was, m fact, the mam purpose for which 
von Neumann and Morgenstern first developed their theory of expected 
utility Within the context of mixed strategies, il can be shown that every 
game has at least one pair of (mixed) strategies that arc m equilibrium 
(Nash, 1950) This by no means trivial result is a partial generalization 
of the famous minimax theorem of von Neumann (1928) which not only 


established the existence of mixed strategy equilibrium pairs m those two- 
person games for which the payoff to one player is always the negative of 
that to the other (the so called zero-sum games), but also showed that if 
{a, b) and {a , b') are in equilibrium, then so are (o, 6'> and (o', b), and 
that the payoffs to a player are the same for all pairs of strategies that are 
in equihbnum Neither of these last two statements is generally true for 
nonzero-sum games, as can be seen m the second game, nor for games with 
more than two players Thus, although the first problem mentioned about 
equilibrium strategies, namely, that they do not prescribe rational behavior 
for the individual, is not a problem when we restrict our attention to two- 
person zero-sum games, it is for other classes of games 

An alternative approach to solving the two weaknesses of the equilibrium 
notion IS suggested by an important property of equilibrium strategies m 
the two-person zero-sum case For each of a player’s strategies, suppose 
that we determine what is the worst (minimum) outcome that can occur, 
and then find those strategies for which this worst outcome is as good as 
possible (that is, find the strategy having the maximum value of the mini- 
mum outcome) Such a strategy js called, for obvious reasons, a maximin 
strategy For two person zero-sum games, it can be shown that a pair 
{a, b) of mixed strategies is in equilibrium if and only if a and b are both 
maximm mixed strategies Observe that the notion of a maximm strategy 
IS defined for any game and that such a strategy always exists whether 
we restrict ourselves to pure strategies or whether mixed strategies are 


\ 
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admitted, however, except for two-person zero sum games, the notion of 
maximm strategies does not have any simple relation to that of equilibrium 
strategies The principle underlying maximm strategies is conservative in 
the extreme concern is focused exclusively on the worst possible outcome 
from any course of action, no matter how improbable that outcome may be 
Savage (1951) suggested that this last notion should be applied not to 
the given matrix of payoffs, or to its mixed strategy extension, but rather 
to what he called the regret matrix, which is constructed as follows For 
a given player, replace the outcome in a given cell by the following number 
find the best outcome (or expected outcome, as the case may be) that he 
could have achieved by using any of his (mixed) strategies, on the assump- 
tion that the choices of chance and other players are fixed, and from it 
subtract the actual value in that cell For the prisoner’s dilemma, this 
yields the transformation 

' (5.5) (-10,10)1 p.5) (9,0)' 

_(10, -10) {- 1 , - 1)J [(0, 9) (0, 0), 

For example, the — 10 m the <fl„ 63) cell becomes 9 because, with fixed, 
the best player A could have done is ~1 by choosing and that minus 
the —10 of the entry yields 9, which is his “regret” in having chosen Cj 
rather than when B chooses The 10 in the same cell for player 2 
becomes 0 because the best he could do with Oj fixed is 10, which minus 
the 10 of the cell yields 0 

The proposed decision rule is to choose any strategy that minimizes 
the maximum regret In the prisoner’s dilemma this rule again dictates 
^2)5 ^he solution previously obtained by the sure thing principle In the 
following game 

r (2,i) (2,2)1 

[(-1, -I) (I, 2)J [(3, 3) (0. 0)J, 

the rule yields <Ui bz), which is not one of the two equihbnum pairs of the 
game, but rather what one would expect if each player attempted to get 
his preferred equilibrium pair Thus, at least m this case, the minimum 
regret criterion seems more descriptive than the equilibrium notion 
Savage’s proposal, although intuitively reasonable, suffers from the 
defect that no theory of utility now in existence is adequate to justify 
calculating the differences required for the regret matrix and, at the same 
time, to justify the expected utility properly which is needed if mixed 
strategies are to be used 

Before turning to other ideas, it should be mentioned that several 
authors, most notably Milnor (J954), have attempted to gam a better 
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understanding of these various decision enten i by means of an axiomatic 
approach They base listed various “elementary” axioms that a decision 
principle might satisfy, and they have shown which sets of these axioms arc 
necessary and sufiicicnt to charactcnic a given principle These results 
give some insight into the various concepts of rationality that arc embodied 
and excluded from the different decision principles For a summary of 
these results, see Luce and Raiffa (1957, pp 286-309) 

The decision principles that we have just discussed all differ sharply 
from the principle of expected utility maximization in that they presume 
absolutely no information about the relative likelihoods of the events that 
affect the outcome, whereas the latter principle can be applied only when 
the decision maker is willing to commit himself in advance to relative 
likelihoods Neither extreme seems to capture the reality of most decision 
problems, which characteristically involve some qualitative information 
about the relative likelihoods of events, but not enough to justify the 
assignment of a unique probability measure to them Little, if any, 
satisfactory axiomatic theory yet exists to deal with dccisionsin the presence 
of some but incomplete information, however, some relevant ideas have 
been discussed, and so we now turn our attention to them 


One framework of discussion centers about the concept of kiel of aspira- 
tion, imlially introduced by Dembo (1931) and now familiar to most 
psychologists The first direct comparison of it with a utility analysis of 
choice behavior was made by Simon (1955, 1956) Simon illustrated his 
ideas by the familiar example of an individual selling his house Each day 
he sets an acceptance price based on various factors bids he has received 
m the past, rumors about prevailing market conditions, his urgency in 
consummating a sale, etc If during the day he receives one or more offers 


above this price, he accepts the highest If not, he sets a new acceptance 
price the next day and awaits further offers The acceptance price he 
sets each day is his level of aspiration (for selling the house) at that moment 
in time Simon contends that this kind of model, including the mecha- 
nisms for changing the level of aspiration with experience, is much closer 
than the expected utility model to the decision process actually used when 
the alternatives are complex and information about them is far from com- 
plete Simon has given a certain amount of theoretical development in the 
two articles cited, but at present the formal aspects, especially the 
mechanism for processing information, are far from fully stated 
Some possible theoretical connections between utility theory and level 
of aspiration were postulated by Siegel (1957) For him, the level of 
aspiration IS a point on the individual s utility curve All outcomes below 
this point have negatwe ntiMy and all above it, positive utility When the 
number of outcomes is finite, Siegel defines the level of aspiration to be 



AI-CEBRA.C C..O.CE THEORIES FOR ONCERTA.H OUTCOMES 3=5 

.,1., —hi... 1..^; " " 

next lower goal However,! aonhcations of these ideas are 
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probably must ultimately be mcorpora^ 

theory of decision making, . , , fashion or tested extensively— 
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-““=s.r »..p. « - — u'hf 

(1949, 1955) porenlm/ have they been applied to any 

stated with any great forma cHr y 4 difficulties that arise 

experiments, they were developed to handle va 

m connection with expectations m choice behavior. Shackle’s 

separation from most psycho insightful and original 

writings include a number of psychologica y 

remarks about choice behavior ,hat there can be assigned a 

To each possible future ««« '’" f en for combining potential 

degree of potential surprise, r* "J" ^f^^ dealing with probabilities 

surprises These differ from the . j. (£) denotes the potential 

mamly in being nonadditive Sp“ “|f>^’ cre essentially the following 
surprise of event £, then the com g^ ^ ^ 

(1) If El n £, = 4’. where r,{E j F) is the potential 

(ii) jj(£ n F) = max [■)(£ 1 fh ht h 
surprise of E given that F has occu /p I ri — ii{E), rule (u) reduces 

Thus, when £ and Findependent, that is, 1,(5 I « - A 

to -n(E r\F) = max [»?(£) 

Arrow(1951a b, 1964) pomted ouUha. ■! 

that “there IS no law of large “umbers or el ttcched to a sequence 

of mdependent events, f- 'he po^tmU^ „ ^eh 

of unfavorable outcomes is as large as successive head 

a calculus can be is >>lustrated by a simp e po en al 

m as many flips of a fair com .3 assigned a po'''"^' 

surprise, yetinShackle «al™ " „3,der first one 

of zero We can see this as Mows^^ ^ ;; c; T is certa, 

the event of a head a"'* J _ a rule (1) ym'hs 

U n = 0. h-a“- ^ ,(7T) 
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Smce surely r)(H) = 4n have 


»!(«) = 0 


Now, making the usual assumption of independence, 

n . n f/ia) = max [>)(//,), . . ■ = 0 

Clearly, 10 plays no role in this argument any finite number of successive 
heads has zero potential surprise’ 

Shackle has a good deal to say in favor of his rules in spite of these 
consequences Essentially, his defense rests on the claim that in real life 
situations we are never in a position to make the probability computations 
of textbook examples such computations are possible only for divisible 
and repeatable experiments Indeed, he has argued against the general 
applicability of either a subjective or frequency theory of probability, but 
his remarks were mainly directed toward the frequency theory and he has 
not dealt m an explicit and technical way with the many persuasive 
arguments put forward by subjectivists, such as de Finetti 


4 EXPBRIME^ITAL TESTS OF ALGEBRAIC 
MODELS 

Over the past fifteen years a number of experimental studies have been 
performed that are more or less relevant to the algebraic theories of 
preference discussed in Secs 2 and 3, but unfortunately few bear directly 
on the empirical validity of the vanous theories discussed There are 
several reasons for this The most important is that the algebraic theories 
of preference are special cases of algebraic theories of measurement, and 
as such do not readily lend themselves to an. exact statistical analysis of 
their relation to experimental data For example, the axioms for the 
vanous ordered metrics are formulated in such a fashion that it is natural 
simply to ask whether or not a set of data satisfies the axioms, not to 
attempt a statistical goodness-of-fit evaluation In fact, it is not entirely 
clear how the theory of goodness-of-fit should be stated for the vanous 
kinds of higher ordered metnes In models similar to those of von Neumann 
and Morgenstern the basic closure postulate requires an infinity of objects, 
and again it is not clear when we should be willing to regard a set of data 
as satisfying or definitely not satisfying the axioms We would not expect, 
of course, any finite sample of data to satisfy the axioms exactly, and the 
difficult question is how to formulate an appropriate criterion 
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In View of such difficulties, it is not surprising that the bulk of the 
experimental literature is concerned with studies designed to test certain 
limited aspects of choice behavior Many of the mam points that have 
been established are rather far removed from the theoretical issues we have 
discussed m the preceding sections In the interest of completeness we 
describe the studies of greatest theoretical import, but because the accumu- 
lated literature is relatively large we do not attempt a really complete 
survey The reader interested m additional references should consult the 
survey articles mentioned in Sec I 4 
Our summary of these experimental studies is organized m terms of 
the major headings of Secs 2 and 3, not in strict chronology Under a 
given heading, however, the order is primarily chronological The first 
section deals with higher ordered metrics, the second with the measurement 
of utility, the third with the measurement of subjective probability, and the 
fourth with other models that have been proposed as a result of experi- 
mental conjectures or results but which we have not discussed with any 
thoroughness in the preceding sections 


4 1 Higher Ordered Metrics 

Coombs’ (1950) concept of a higher ordered metric has been applied to 
experimental studies of preference by Coombs and his collaborators m a 
number of studies and also by Siegel (1956) and Hurst and Siegel (1956) 
Coombs (1954) applied hts ideas about ordered metrics and the un- 
folding technique to judgments of the esthetic quality of a series of 
isosceles triangles all of which bad a base of 1 in and an altitude that 
vaned fiom 0 Z5 to 2 5 wv steps of 0 IS w Sets of three triangles were 
presented to subjects who judged both the most and least preferred in each 
triad AH 120 possible sets of three triangles formed from a set of 10 were 
presented (It should be emphasized that this use of the method of triads 
IS not essential to the application of Coombs’ ordered metric concepts ) 
In analyzing the data, it was assumed that the most preferred triangle 
was located nearest the subject’s own “ideal * mangle and that the least 
preferred triangle was located furthest away The data may be described 
as follows The unfolded data from 17 of the 31 subjects satisfied a 
common interval scale for the 10 stimulus triangles The responses of 8 
additional subjects satisfied the ordinal properties of this common scale, 
but they required dilTercnt mctnc relations The 6 remaining subjects 
failed to satisfy the ordinal properties of the common scale It is to be 
emphasized that the imposition of the possibility of unfolding the data to 
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eet a common ordering of stimuli and subjects is, of course, a much stronger 
assumption than simply assuming that each subject Ins his own weak 
higher ordered metric Considering the subjectivity usually associated 
with such esthetic preferences, it is somewhat surprising that the data from 
17 of the subjects could be inlcrprcled in terms of a common strict higher 


ordered metric 

A useful discussion of some of the problems and mathematical conse- 
quences of determining an ordered metric from experimental data was 
given by Coombs and Beardslee (1954), but the only data they reported 
are from a pilot experiment with one subject, so wc do not review the 
results here 

In moving from a strong higher ordered metric to a strict higher ordered 
metric, in the sense of the definitions given in Sec 2 4, it is necessary but 
not sufficient for adjacent utility intervals to be additive Coombs and 
Komonta (1958) investigated the extent to which additivity is satisfied in 
an experimental ordered metric for the utility of money Only three 
graduate student subjects participated in the experiment, but a fairly large 
number of preferences were required from each A method of triads was 
used that was similar to the one just described The expenmental data 
quite strongly supported the additivity hypothesis In particular, of 30 
predictions tested, 29 were confirmed It should be mentioned that one 
can expect the results to be best when the outcomes are money, because the 
natural order of preference is very strongly entrenched in all subjects One 
never expects a subject to prefer less to more money, but inconsistencies 
of simple preference are common with nonmonetary outcomes 
Two other papers by Coombs (1958, 1959) are also concerned with the 
problems of applying the unfolding technique to obtain an ordered metric, 
but because the preferences were mostly probabilistic, the summary of this 
experiment is reserved until Sec 8 2 

Siegel (1956) described a method for obtaining a strong ordered metric 
scale m the sense of Def BofScc 24 Heused simple gambles constructed 
from events with subjective probabilities of one half, following the pro- 
cedure suggested m Davidson, Suppes, and Siegel (1957), to obtain not 
only an ordering of alternatives but also a complete ordering of the 
differences between alternatives The ordinal comparison of utility inter- 
vals in Siegel s method is based on the kinds of options and the resulting 
equations discussed in Sec 3 3 (see, in particular. Inequality 6 ) Because 
the original article presented only a pilot study based on one subject, we 
do not consider it further However, Hurst and Siegel (1956), using the 
same methodology, ran 30 prison inmates with cigarettes as outcomes 
rather than money Of the 30 subjects, 29 completed the experiment The 
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following IS a typical offer 

Option I Option 2 
QVJ r +30 +5 ' 

QUGl -10 +5 

where 0C7 is a nonsense syllable printed on three sides of a die and QUG 
IS a nonsense syllable printed on the other three sides Sufficient options 
were used m order to construct an ordered metric (in the sense of Def 8, 
Sec 2 4) for each subject, and from this metric a number of choices was 
predicted between additional options 
The mam hypothesis tested was the accuracy of these predictions as 
compared with those based on the model in which it is assumed the subject 
maximizes the expected number of cigarettes For 6 subjects the two 
models gave identical predictions, which means that their subjective 
utility scales were essentially linear in the number of cigarettes For 
15 subjects the ordered metric model yielded better predictions than the 
simple expectation model, but data for 8 subjects fell in the opposite 
direction. For an overfall statistical test, the total number of erroneous 
predictions for each of the two models was determined for each subject, 
and the differences between these two totals were ranked A Wilcoxon 
matched-pairs, signed-ranks test indicated that the observed rank sums 
differed significantly from what would be expected under the null hypoth- 
esis and favored the ordered metric model {? < 0 025) 

Because of the close relation between the additivity assumptions dis- 
cussed in Sec 2 3 and the ordered metric models, we mention at this point 
an experiment by Fagot (1956), reported in Adams and Fagot (1959), 
which was designed to test additivity assumptions Each of 24 subjects was 
instructed to take the role of a personnel manager for a corporation and 
choose among hypothetical applicants, two at a time, for an executive 
position of a specific nature Applicants were dcscnbed in terms of just 
two characteristics, intelligence and ability to handle people, with f^our 
levels of ability admitted for each characteristic Each choice of a subject 
was represented as an inequality in utility values as dcscnbed in Sec 2 3 
The additive model is satisfied if and only if the entire set of 77 inequalities 
has a solution (43 of the 120 applicant pairs were omitted because one 
weakly dominated the other on both charactenstics) Analysis of the 
data showed that 6 subjects satisfied perfectly the additive model The 
remaining 18 subjects who did not satisfy the additive model surpnsingly 
enough did not satisfy the simple ordinal model, that is, each of the 18 
violated the transitivity axiom at least once More detailed amljsis 
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sueeested that the deviations from cither the ordinal or additive model 
were due mainly to occasional inattention, for 80% of the subjects had a 
most three such “errors" or deviations, that is, at least 74 of the 7 
observed choices satished the additive model. Only two of the subjects 
had more than four such errors 


4 2 Measurement of Utihly 

Although discussions of the measurement of utility have had a relatively 
long history both in economics and psychology, and the proposal to extend 
classical psychophysical methods to the measurement of utility goes back 
at least to Thurstone (1945), the first real expcnmcntal attempt to use any 
of these ideas m an actual measurement of utility was by Mostcller and 
Nogee (1951) Their experiment was designed to test the empirical 
validity of the von Neumann and Morgcnslern axiomatization as applied 
to alternatives consisting of the gain and loss of small amounts of money, 
together with probability combinations of such outcomes. The subject 
was presented with bets which he could accept or refuse (the bets were 
presented to four or five subjects at a time, but each subject made his own 
decisions) If he refused the bet, no money changed hands , if he accepted 
It, he either lost 5 cents or won some amount z of money, which one 
depending on whether or not a specified chance event £ occurred When 
the subject was indifferent between the two options, that is, when he 
accepted the bet approximately half the time, then the following equation 
was used to calculate the relative utilities of the outcomes 

um) = 5^;) + (1 — p)u{x), 

where p is the probability of £ occurring Since u is an interval scale, 
«(0^) and u(— 5^) can be fixed By setting them to be 0 and — I, respec- 
tively, m(«) = p/(l — p) Note that p is an objective probability, not a 
measured subjective one Thus, by selecting an event with probability p 
and varying the amount x of money until the subject was indifferent 
between the options, it was possible to find the amounts of money 
corresponding to any fixed utility u = p/(l - p) Nine utility points were 
determined for each subject ranging from «(— 5({i) = — 1 to 101 This 
constructed utility function was then used to predict the choices of each 
subject when he was faced with somewhat more complex options 

Fourteen subjects completed the experiment, of whom nine were 
Hatv^d undergraduates and five were from the Massachusetts National 
Guard Each subject was given U 00 at the beginning of each hour of 
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play with which to gamble Testing took place over a period of about four 
months 

The bets were constructed from possible hands of poker dice After 
three initial sessions, each group of subjects was instructed in how to 
calculate the true odds for any hand, and m addition, each was given a 
sheet with the true odds for all the poker dice hands used and was, 
moreover, “required to keep this sheet in front of him in all future 
sessions ” Additional information was provided to the subjects about the 
number of times a particular hand had been previously played and the 
number of times players had won when they selected it during the initial 
“learning” sessions 

In order to obtain utility curves for individual subjects, the first step in 
the analysis requires that the point of indifference be determined for each 
subject for each hand of poker dice used This means that the amount 
money must be found for which the subject is indifferent as to whether he 
accepts or rejects the bet Mosteller and Nogee (1951, p 383) described 
m clear fashion their procedure for finding this point 

For each hand a range of offers had been made The proportion of times the 
subject elected to play each offer was calculated, and these points were plotted 
on ordinary arithmetic graph paper with vertical axis as per cent participation 
and horizontal axis as amount of offer in cents A freehand curve or a broken- 
line curve was then fitted to the points The abscissa value of the point where 
this curve crossed the 50 per cent participation line gave in cents the subject s 
indifference offer for that hand In other words for that hand this calculation 
yielded an interpolated offer which the subject would be equally likely to accept 
or reject if given the opportunity 

Since the probability p of winning is fixed for a given hand, the utility of the 
indifference offer is just (I — p)lp 

Figure 2 shows how this indifference point was constructed for subject 
B-1 and the poker dice hand 55221 (Amounts of money in cents are 
plotted on the abscissa ) Note that Fig 2 approximates rather closely the 
perfect step function that is required theoretically by the von Neumann- 
Morgenstern axiomatization and, in fact, by all of the standard algebraic 
theories for choice among uncertain outcomes 

Utility curves constructed in this manner for two of the student subjects 
are shown in Fig 3 Both these curves support the classical assumption of 
decreasing marginal utility, but, as Mosteller and Nogee pointed out. such 
apparent support for this economic idea must be interpreted cautiously 
For one thing, the range of bets for the students ran only from 5 cents to 
about S5 50 and only up to about SI 00 for the Guardsmen (For 
obvious financial reasons, this resinction of the range is not peculiar to 
the Mosteller and Nogcc experiment, but is true of all the experiments 
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Cents 

Fjg 2 The data for one subject and hand 55221 arc plotted to show how the 
ence point is actually obtained Adapted by pennisuon from Mostcller & Nogee (I » 
p 385) 

that have been performed ) For another, the utvlvty curves of the Guards- 
men generally supported the assumption of mcreasmg marginal utility in 
the range of bets considered 

Mostcller and 'Nogee gave the subjects additional offers of a more 
complicated sort which they called doublet hands In a doublet hand the 
subject IS presented with two hands of poker dice if he beats the low 
hand, he gets a certain reward, if he beats the high hand, he receives an 
additional reward as well, and if he fails to beat either hand, he loses 
5 cents As the authors remarked, the introduction of these doublet hands 
caused considerable confusion in the betting patterns of the subjects 
The most definite evidence of this confusion was the increase in what the 
authors called the zone of inconsistency, that is, m the zone m which the 
subject did not consistently choose to accept or refuse a bet The presence 
of a fairly wide zone of inconsistency, both for the original bets and for the 
doublet hands, of course, is not compatible in a literal sense with the von 
Neumann»Morgenstern axiomatization This incompatibility is partic 
ularly reflected by Theorem 17 of Sec 3 2, which asserts that there is a 
unique probability combination of alternatives that is indifferent to a 
given alternative As was already remarked, the existence of zones of 
inconsistency is precisely thercasonfor the development of the probabilistic 
theories discussed in Secs 5 to S of this chapter Apart from the existence 
of a larger zone of inconsistent^, the constructed utility curves were fairly 
successful in predicting behavior with respect to the doublet hands We do 
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not attempt a quantitative summary of the data here The reader is urged 
to refer to the ongmal Mosteller and Nogee article, which contains a 
large number of interesting side remarks and insights into the behavior ol 
the subjects, as well as a good deal of additional systematic material we 
have not attempted to summarize Mosteller and Nogee summarize 
(p 403) their experiment in the following fashion 

1 that It IS feasible to measure utility experimentally, 

2 that the notion that people behave m such a way as to maximize their 
expected utility IS not unreasonable, 

3 that on the basis of empirical curves it is possible to estimate future 
behavior m comparable but more complicated risk taking situations 

We next turn to the experimental studies of Davidson, Suppes, and 
Siegel (1957), which in. pact were undertaken to meet the following three 
important criticisms of the Mosteller and Nogee experiment First, 
Mosteller and Nogee failed to check systematically the adequacy of their 
utility curves in terms of simple bets of the same type as those used to 
construct them Such checks using simple hands would have provided a 
better test of the von Neumann Morgenstern axioms than those based on 
the more complicated doublet hands which may introduce complications 
not embodied m the axiom system Because of this lacuna, it is someivhat 
difficult to evaluate precisely whether or not the Mosteller and Nogee 
experiment substantiates a claim that utility can be measured on an interval 
scale, for some recent remarks on this point, see Becker, DeGroot, and 
Matschak (1964) 


The second criticism is that the choices used by Mosteller and Nogee 
to estimate the utility functions involved either accepting or rejecting a 
gamble One option always involved playing and therefore taking a risk, 
■whereas the other resulted m no play Thus, if participation itself involves 
either a negative or a positive utility, the experiment was so designed that 
this special feature could produce maximum distortion 
The third criticism is that their analysis, which to some degree was 
necessitated by the experimental design, assumed that subjective proba- 
bility IS equal to objective probability In spite of the fact that they -went 
to some pains to brief subjects on the probability considerations involved 
in the offers with which they were faced, no cleat evidence exists to show 
that they were wholly successful in equating the two probabilities from 
the subject’s standpoint 


In designing a study to meet these three criticisms, Davidson, Suppes, 
and Siegel (1957) essentially followed the Ramsey (1931) line of develop 
ment lor measunng utility and subjective probability For the moment 
we resold ourselves to their efforts to measure utility 
To egm With, it was necessary to find an event whose subjective 
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probability is ^ (see Sec 3 3, particularly Eqs 8 and 9) Initially they 
tried a com, but most subjects in a pilot study showed a preference for 
either heads or tails An event that was found to satisfy to a close 
approximation the conditions expressed in Eq 8 of Sec 3 3 was produced 
by a specially made die that had the nonsense syllable ZOJ engraved on 
three faces and ZEJ on the other three faces Two similar dice were also 
used with WUH and X£Q in one case, and QUG and QUJ m the other 
The syllables selected are ones that, according to Glaze (1928) and others, 
have essentially no associative value Because their method of generating 
chance events that have a subjective probability ^ has been used in several 
studies, some additional remarks may be desirable concermng exactly m 
what sense Eq 8 of Sec 3 3 has been verified A problem exists because the 
subject must choose one option or the other, and so there can be no direct 
evidence of indifference Using outcomes that varied by amounts of 
1 cent, Davidson et al used a method of approximation that was found 
to hold within the 1 cent limit Consider the followmgtwo pairs of options 


Option 1 

Option 2 


Option 1 

Option 2 

ZOjr+(x+l) 

1 

ZOJ\ 


-j 

ZEJl -X 


ZEj\ 

— * 



If Option 1 IS selected in the left matnx and Option 2 in the right one for 
all choices of a, then it is reasonable to infer that the subject is satisfying 
Eq 8 approximately and, thus, that the subjective probability of one 
nonsense syllabic occurring is approximately equal to that of the other 
occurring 

Having found an event with subjective probability approximately equal 
to the next step was to find a similar approximation to axioms for 
utifity tfifierences (Def ff of Sec 2 3) Roughly spcaha;^, the mcihod 
used consisted of finding an upper and lower bound for each point of an 
equally spaced utility function Tests of the interval scale property may 
then be formulated in terms of these bounds 

A summaiy of the data from Chapter 2 of Davidson ct al , showing the 
bounds in cents for fixed points on the utility scale, is given in Table I 
The 15 subjects for whom data arc shown were male students hired 
through the Stanford Unncrsity Employment Service Four other 
subjects were run for whom the data on the subjective event of probability 
I were satisfactory, but they did not satisfy the checks required in order to 
substantiate a claim of interval measurement and, therefore, their utility 
scales ore not shown in Table 1 

Eichl of the subjects wrc twice reran after periods varying from a few 
days'to several weeVs. and two of these were ran a fourth time Bounds 



VRErERmCE, UTILITY. AND SUBJECTIVE PROBABILITY 

for these remeasured utility curves were, in general, quite similar to those 
determined in the initial session This evidence is encouraging, particu- 
larly in the light of the concern expressed by several investigators 
that experimentally determined utility curves ate temporally unstable 
It IS natural to ask whether the upper and lower bounds found for the 
utility points/, c, d, and g m Table 1 enclose the actuarial value, that is, to 
the accuracy of the measurement scheme, is the utility function simply 


Table 1 Summary of Data Determining Bounds in Cents for Fixed 
Points on the Utihty Scale with u(~4^) — —1 and «(6i^) — !• 
(Davidson, Suppes, &. Siegel, 1957, p 62) 



Bounds 

Bounds 

Bounds 

Bounds 


for/ 

for c 

for d 

forg 


Where 

Where 

Where 

Where 

Subject 

a(/) = -5 

1 

a(d) = 3 

uCf) = 5 

1 

~18 to -15i 

-11 to - 104 . 

11 to 12i 

14 to 184 

2 

-34 to -30 

-12 to -11 

12 to 18 

31 to 36 

3 

-1810 -11 

-8 to -7 

10 to 13 

14 to 22 

4 

-29 to -24 

-15 to -14 

14 to 1? 

25 to 31 

S 

-21 to -14 

-10 to -9 

10 to 12 

16 to 24 

6 

-25 to -21 

-14 to -13 

13 to 15 

19 to 23 

7 

-18 to -7 

-7 to -6 

7 to 14 

10 to 23 

8 

-25 to -21 

-14 to -13 

14 to 17 

23 to 28 

9 

-35 to -29 

-12 to -11 

16 to 18 

43 to 50 

to 

-26 to -20 

-15 to -14 

14 to 15 

20 to 27 

11 

-22 to -19 

-14 to -13 

11 to 13 

18 to 22 

12 

-21 to -13 

-12 to -11 

8 to 12 

11 to 15 

13 

-34 to -23 

-14 to -13 

13 to 17 

23 to 32 

14 

-16 to -13 

-10 to -9 

12 to 15 

20 to 24 

15 

-12 to -8 

-8 to -7 

8 to 10 

U to 15 


linear in money! Of the 60 pairs of bounds shown in Table 1, 19 of the 
intervals are entirely above the linear money value. 21 arc entirely below 
il, and 20 include it ■' 


Finally, to avoid the second cnticism of the Mostellcr and Nogee 
experiment, choices in the cxpcnraent of Davidson et al were primanly 
between pairs of options, both of which involved uncertainty It is 
interesting, therefore, to note the strong similarity between the shape of 
eenTom Predictions made in the two 

re”his ama""’”'"''' «P-ni=ntal 
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Davidson et al (pp 75-79) stated two important criticisms of their 
expenment One is that the method is restricted to objects that are 
equally spaced in utility, which severely limits its applicability It is 
suitable for essentially continuous commodities, such as money, but it is 
quite unsatisfactory for many interesting alternatives that are essentially 
not divisible The other criticism is that the method of approximation 
used is cumbersome, and it becomes especially unsatisfactoiy when a 
large number of utility points are sought because the width of the bounds 
tends to spread as the number of points is increased 
In the next chapter of their book, Davidson et al attempted to meet 
these two cnticisms by using a linear programming model to measure 
utility As was already remarked in Sec 3 3, when a chance event E* of 
subjective prohahiUty ^ js used, a choice of Option 1 over Option 2 m 


Option 1 Option 2 



may be represented by the inequality that is based on the maximization 
of expected utility 

+ «(6) ^ (13) 

The essential idea is to apply linear programming methods to solve a set 
of inequalities of this form In the expenment, options were built up 
from all possible pairs of six different outcomes, but excluding those for 
which a prediction follows from ordinal preferences alone The subject’s 
choices m the remaining 35 pairs of options therefore determined 35 
inequalities in six variables Only rarely does this set of inequalities have a 
solution Therefore, Inequality 13 was replaced by 

u(a) + «(*) + 0 ^ w{c) + u(c/) (14) 

where 0 is a constant to be determined 

Using linear programming methods, a solution is subject to a set of 
inequalities of this form such that 0 is a minimum and such that the 
normalizing restriction is met that the smallest interval be of length one 
Intuitively, 0 may be thought of as the threshold of preference if the 
utility difference between a and c differs by more than 0 from the difference 
between d and b, then the larger interval must be chosen, that is, Option 
(< 7 , U) IS chosen over (c, d) Some additional probabilistic postulates for 
the behavior when the difference between two utility intervals is less than 
0 were also formulated by these authors, but we do not consider them here 
It must be emphasized that the utility function obtained by this linear 
programming approach is not unique the minimum 0 is compatible with 
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a convex polyhedron of solutions Because of limited computational 
facilities, Davidson et al reported only the first solution reached, although, 
as they remark, the best single choice is probably the centroid ot the 
polyhedron 

To test this linear programming model, the following three-session 
experiment was performed with ten music students as subjects and long- 
playing records as outcomes In the first session a utility curve for six 
records was determined by the methods just described In the second 
session a utility curve was found for another six records, two of which 
were also in the first set, thus permitting the construction of a joint 
utility curve This joint curve could then be used to predict choices 


Table 2 Summary of Predictions of Linear Programming and 
Ordinal Models (Davidson, Suppes, & Siegel, 1957, p 92) 


Subject 

Linear Programming Model 

Total 

Ordinal Model 

Total 

Gear 

Piediclions 

Conect 

Gear 

Predictions 

Wrong 

Predictions 
Within 6 

Predictions 

Correct 

Prediction* 

Wrong 

1 

30 

1 

24 

SS 

6 

0 

6 

2 

24 

7 

24 

55 

18 

2 

20 

3 

21 

0 

34 

55 

11 

2 

13 

4 

23 

13 

17 

55 

21 

n 

32 

S 

21 

19 

13 

35 

27 

19 

46 

6 

4 

0 

St 

55 

0 

0 

0 

7 

10 

13 

32 

55 

7 

12 

19 

Total 

133 

55 

197 

385 

90 

46 

136 


between untested combinations of the full set of ten records used m the 
fust two sessions, these predictions were tested in the third session 
Of the 10 subjects, only 7 were able to complete the entire experiment 
The predictions for these 7 subjects are summanzed m Table 2 The 
fust column gives the number of predictions that were clearly correct— 
"clearly" because the utility differences were greater than the estimated 0 
The second column gives the number of predictions that were clearly 
wrong, and the thud, those that fell within 0 and therefore need a more 
detailed probabilislic analysis The nght-hand side of the table gives the 
aKuracy of the predictions that follow from the simple ordinal model 
that is obtained m the obvious way fiom the rankings given by the 
subjects Unfortunately, a rather large proportion of predictions fell 
wilhin 0. thus making it dilficuU to evaluate the adequacy of the mode! 
If 0 IS Ignored and one simply predicts the oplion having the larger of 
two utility imcnals, then 254 ate correct and 126 arc wrong In only five 
cases ,s no prcdiclion made because of exact equality of the intervals 
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Perhaps the major difficulty encountered m applying the linear 
programming model to data from this experiment is the fact that only one 
subject satisfied the purely ordinal axiom of transitivity, that is, the 
ordinal predictions of how the long-playing records would be ordered in 
preference in Session 3 was completely verified for only one subject 
Lack of transitivity accounts for the large 0’s and the resulting insensitivity 
of the model which is reflected in Table 2 
Davidson et al pointed out two undesirable features of the model itself 
In the first place, when the inequalities have integer coefficients, as these 
do, the application of linear programming methods almost always yields 
very simple values for the utility intervals, in fact, there is a strong 
tendency for the utility intervals to become identical, a tendency which 
seems psychologically unrealistic For example, under the normalizing 
assumption stated previously that the minimum interval be of length one, 
all utility intervals of adjacent alternatives in Session 1 were either 1, 1,2, 
or 3 For five atomic intervals and seven subjects, that is, for a total of 
35 intervals to be measured, this restricted set of numerical results 
certainly seems far too simple 

The second criticism is that the model leads to simple violations of the 
sure-thmg principle when it is applied to monetary outcomes, specifically. 

It predicts that a larger interval will not be chosen with probability one 
when both utility differences are less than 0 Thus, to use their example, 
when the subject is presented with the following pair of options 
Option 1 Option 2 
504 45<f 1 

£*[ - 4 ^ -50^ J 

the sure thing principle and common sense choose Option i However, 
for a 0 of any size that choice is not predicted with probability one 
This failing was eliminated in a closely related nonlinear programming 
model proposed by Suppes and Walsh (1959) In their setup inequalities 
of the form of Eq 14 were replaced by nonlinear inequalities of the 
following sort 

r][u(a) u(b)] ^ w(c) -f- u(ef) 

and the problem was to minimize the threshold parameter rj subject to 
the restnction that tj ^ 1 Their experimental procedure was very 
similar to that of Davidson et al , just described, except that the outcomes 
were monetary Their subjects were eight sailors from an air base 
Without going into their expenmcntal results in detail, suffice it to say that 
they did find some predictive supenonty of the nonlinear utility model 
over the actuanal model A rather thorough critique of their interpretation 
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of their results was given by DeGroot (1963), who makes a good case for 
claiming that their model shows little real superiority over the actuarial 
model 

A probabilistic version of the linear programming model has recently 
been developed and applied by Dolbear (1963) His results are discussed 
m Sec 8 4 

In an early experiment — the next following the classic Mosteller and 
Nogee study— Edwards (1955) was concerned with the confounding of 
utility and subjective probability, and he proposed to test a subjective 
expected utility model Rather than apply one of the rather elaborate 
methods of measurement discussed in Sec 3, he invoked a convenient 
but highly simplifying assumption, which, m the notation of Sec 3 2, 
may be expressed as follows There is an integer N such that 


“(HM ^ Nu(4x„), 


(15) 


where y is a reasonably large amount of money like $5 50 c is a small 
amount of money like 10 cents, and x. is the outcome of ne'itLr winmng 
but a nofd’'"^ Naturally we do not expect exact equality to hold 

p perto now the elpeoted utility 

property to Eq 15 and choosing the scale so that n(x.) = 0, we obtain 


ufy) Nu(€) 


(16) 


ll uXf'ritoLtrP"’™"*’ = > Then 

assumption of an interval scal^ "u'nfT/ •''n standard 

subsequently pointed oufhSelf dteTb« 

generality, it mav be shown m’ i assumption is taken in full 

The subjects wem^ rieuLl‘’ ^a‘'““ 

a total of 50 sessions The first H^essmilr "ns run for 

method of utility mcasurem,ni "are devoted to the N-btts 

indifference was similar to that of Mos'teu"’“' determining 

outstanding thing about the regime mibil The most 

degree of linearity, which is miieh ® ourves is their relatively high 
and^Nogee and by Davids„„,Tu;'’pet::id 'I*;" 

the boml^r orcorre™anTmLm«^^^^^^^ oapenment, ,n particular, 
expected utility model, until ^ ‘ho subjective 

bteaiurmgsubj"ectiiep;„babditw^%f““^“‘‘ "lethois for 

on« agam m"emphLu^,rZse°rcad' of utility, we wish 

- - -har 
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points a very penetrating literature exists on the measurement of utility 
withm economics [See the bibliography of Majumdar (1958) for a guide 
to these papers and books ] However, much of this literature, m spite 
of Its acuity, differs considerably from the usual psychological article 
On the one hand, no systematic experimental or empirical results are 
presented, and, on the other hand, no systematic formal methods or 
results are set forth A high proportion of these articles consists of acute 
observations on what should be the case in the behavior of any reasonable 
person, and, as might be expected, much of the discussion is devoted to 
normative rather than descriptive concerns It is nonetheless a trifle 
startling to find that Majumdar’s entire book, which is devoted to the 
measurement of utility, contains no reference to any experiments aside 
from a very casual one to Mosteller and Nogee 


4 3 Measurement of Subjective Probability 

The bifurcation of these measurement problems into utility and 
subjective probability is artificial for several of the studies that we are 
summanzing because both have been measured within a single study, 
sometimes even within the same experimental session, and predictions 
have been made from both measures On the other hand, several of the 
studies about subjective probability evidenced no concern with utility 
whatsoever, and so we have been led to make a simple distinction between 
the measurement of utility and subjective probability Moreover, as far 
as we know, no experimental studies exist in which extensive measurements 
of subjective probability were made first, following the development of 
Savage (1954) which was discussed in Sec 3 3, and then followed by the 
measurement of utility In all the studies with which we are familiar the 
order has been reversed utility measurement has preceded subjective 
probability measurement (In making this statement we do not count as a 
quantitative measurement of probability the determination of an event 
with subjective probability i ) 

The first attempt to measure subjective probability expenmentalJy 
was apparently made by Preston and Baratta (1948) Subjects were run in 
groups of two or more, and th^ used play money to bid for gambles in 
a simple auction game The successful bidder was permitted to roll a 
set of dice after the bidding was completed The probability of winning 
with the dice corresponded exactly to the probability stated on the card 
presented for auction For exaroplcv on a given play subjects might bid 
for a prize of 250 points with probability 0 25 of winning If for this 
gamble the average successful bid was 50 then the authors computed the 



322 


PREFERENCE, UTILITY, AND 


SUBJECTIVE PROBABILITY 



Fig 4 Functional relationship between psychological and mathematical probability 
Adapted with permission from Preston & Baratta (1948, p 188) 

psychological ptobabvUty to be 50/250 = 0 20 From our earlier discus- 
sion, It IS clear that this method of computing subjective probability 
assumes that the utility of play money is linear with the value assigned 
to the money There is little evidence to support this assumption, but, m 
spite of this, the experimental results are of considerable interest 
Using this method of computation, perhaps their most interesting 
conclusion was that objective probabilities less than 0 20 are systematically 
overestimated and objective probabilities ^eater than 0 20 are systema- 
tically underestimated The mathematical or objective probabilities of 
the gambles they offered and of the psychological probabilities computed 
from the bid data are shown m Fig 4 The point of intersection of the 
subjective and objective probability curves is about 0 20 Other related 
results arc referred to shortly 

Two other conclusions are noteworthy First, by comparing the data 
of sophisticated subjects with those of subjects who were relatively naive 
about the facts of probability, they concluded that the underestimating 
and overestimating effects just described exist m both kinds of subjects 
Second, an increase in the number of players, and therefore in the 
competition among them, tended to increase the underestimation of high 
probabilities and the ovcrestimalion of low ones, which is a somewhat 
surprising result 

A readymade, real life companson of subjective and objective proba- 
bilities IS provided by the bcUmg on horses under the pan-muluel system 
as compared with the objective probabilities of winning as determined a 
posteriori after the races The total amount of money bet on a horse 
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divided by the net total bet on all horses in a race determines the odds 
and thus the collective subjective probability that the horse will win 
Griffith (1949) examined data from 1386 races run in 1947, his results are 
in remarkable qualitative agreement with those obtained by Preston and 
Baratta Low objective probabilities arc systematically overestimated by 
the subjective probabihties, and high objective probabilities are 
systematically underestimated For Griffith's data, objective and 
subjective probability are equal at 0 16, which is close to the 0 20 value 
obtained by Preston and Baratta Griffith remarks in a final footnote 
that the same computations were carried through for all races run at the 
same tracks m August 1934, and essentially the same results were obtained 
In this case the indifference point fell at 0 18 rather than at 0 16 The 
invariance of the results from the depression economy of 1934 to the 
relatively affluent one of 1947 increases their significance 

A still more extensive study, which is very similar to Griffith’s, was 
made by McGlothlm (1956) of 9605 races run from 1947 to 1953, mostly 
on California tracks As one would expect, the general character of bis 
results agree closely with those of the earlier study The objective and 
subjective probability curves intersect between 0 15 and 0 22 McGlothlm 
also found some interesting tendencies during the course of the usual 
eight races on a given day First, there was an increasing tendency to 
overestimate low probabilities, and this phenomenon was particularly 
striking for the final race of the day There seems to be evidence that on 
the last race of the day many bettors are uninterested in odds that are not 
sufficient to recoup their losses from the earlier races This phenomenon 
IS so striking that bets placed on low odds horses had a net positive 
expectation even after subtracting the tracks’ take, which is 13% m 
California This kind of dynamic phenomena is not easily incorporated 
into an expected utility model There was also some evidence that losing 
bettors increased the size of their wagers on the next race more than did 
winning bettors 

As has already been remarked, Mosteller and Nogee could and did 
interpret their experimental results in terms of subjective probability 
rather than utility Their comparison of objective and subjective proba- 
bilities for their two groups of students, as well as the data from the 
Preston and Baratta study, arc shown in Table 3 These results are not in 
qualitative agreement with those of Preston and Baratta and of Griffith 
The Harvard students systematically underestimated objective proba- 
bilities over the entire range of values (Naturally, this could not happen if 
subjecliNC probabilities v.erc required to add up to 1, but this constraint 
was not built into the studies we have been considenng ) The Guardsmen, 
on the other hand, equated subjective and objective probability at about 
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p = 0 50, Which IS considerably higher than the corresponding points 
obtained by Preston and Baratta, Griffith, and McGIothlin We join 
Mostelkr and Nogee in being at a loss to explain the differences in these 
results 


Table 3 Comparison of True and Psychological 
Probabilities from Preston and Baratta (1948) 
and for the Student and Guardsmen Groups of 
the Mosieller and Nogee Experiment (1951, 
p 397) 


Approximate 

True 

Probabjhty P & B 


0 66T 

0 55 

0 498 

0 42 

0 332 

0 26 

0167 

015 

0 090 

014 

0 047 

Q12 

0010 

007 


Students Guardsmen 

Ar = io jsr«5 


0 54 

0 56 

0 47 

0 50 

0 30 

0 36 

016 

0 28 

0 081 

0 18 

0038 

0 083 

0 0085 

0052 


We noted in Sec 4 2 that Edwards (1955) measured subjective proba- 
bility by using the utility curves constructed by his N bets method and the 
familiar equation for indifTeretice between a certain gam of a fixed 
amount and a probability p of winning another amount or obtaining 
nothing Under the utility normalization chosen (see p 320), the subjec- 
tive probability of the event having objective probability p is just the 
obvious ratio of the two utilities He devoted sessions 13 to 18 to subjective 
probability measuteinenls, using both negative and positive outcomes 
Edwards results were difTcrcnl from any previously mentioned When 
the outcomes were positive, the five subjects almost uniformly overesti- 
mated objective probabilities in the probability range tested, namely, 
from i to 5 When the outcomes of the bets were negative (losses), the 
subjective values were extremely close to the objective ones 
Edwards used the constructed utility and subjective probability functions 
to predict chriiccs in Sessrons 19 to 22 The subjective expected utility 
TOi^cl he applied, which was described on p 320, predicted significantly 
^tter than chance the choice behavior of subjects m these four sessions 
One hundred and twelve ofltrs were made of each of four bets at each of 
end 1 determined by their monetary outcomes 

nd objectnc probabilities Of the resulting 2240 offers, the subjective 
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expected utility (SEU) model correctly predicted the choice made m 1629 
cases Edwards compared these predictions with those obtained from the 
expected utility (EU) model that utilizes objective probability and measured 
utility and with the subjective expected monetary (SEM) model that 
utilizes monetary value and subjective probability The results for the EU 
model were slightly worse than chance> and therefore very unsatisfactory 
The SEM model, on the other hand, did slightly better than the SEU 
model, but not at a level that is statistically significant These results 
argue that the difference between subjective and objective probability is of 
considerably greater importance than that between utility and monetary 
outcome Some extensions of Edwards’ findings are reported by Becker 
(1962) 

Davidson, Suppes, and Siegel (1957, Chapter 2) used their constructed 
utility curves (Sec 4 2) and Eq 10 of Sec 3 3 to measure subjective 
probability For seven of their subjects they made three different deter- 
minations of the subjective probability of an event whose objective 
probability was 0 25 This event was generated by a four-sided die — two 
opposite ends were rounded so that it always landed on one of the other 
four sides Using the method of approximation described m Sec 4 2, 
they found upper and lower bounds for each subject and for each com- 
bination of utilities used For five of the seven subjects the three sets of 
lower and upper bounds had a nonempty intersection, and for two they 
did not For four of the five subjects with a nonempty intersection, the 
upper bound was below 0 25 and for the remaining subject the lower 
bound was below and the upper bound above 0 25 These results agree 
with those of Preston and Baratta, Griffith, and McGlothlm concerning 
the underestimation of probabilities above 0 20 

Among the experimental studies devoted to the measurement of 
subjective probability, this is the only one m wfuefi the procedure used 
could admit the conclusion that subjective probability cannot be con- 
sistently measured, although a number of experimental studies are cited 
in Sec 4 4 which cast doubt on the possibility of such measures The 
point IS that a claim of fundamental measurement can only really be made 
when It is clear that enough data have been collected so that it is possible 
to reject the model The systematic pursuit of the cross-checks implied by 
any model of fundamental measurement seems to have been unduly 
neglected in most experimental studies of subjective probability 

Toda (1951, 1958) has proposed a two-person game method for measur- 
ing subjective probability which is very similar (o the auction procedure 
of Pcesion and Baratta One of its more inicrcstmg applications was 
made by Shuford (1959) to obtain extensive measuremenis of subjectnc 
probability of both elementary and compound events Subjects rolled a 
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20 face die (with the integers 1 to 20 on the faces) twice to select the row 
and column of a 20 X 20 matrix of vertical and horizontal bars The 
subjects, 64 airmen in training at an airbase, were run in pairs The 
sequence of events on a given trial was as follows The matrix of horizontal 
and vertical bats was projected on a screen Subject A wrote down his 
bid X that a honzontal bar, say, would be selected (0 ^ t 10), subject 
B decided to “buy” or “sell” A s bid, the die was rolled by the experimenter 
to decide the bet, that is, which element of the projected matrix was 
selected, and finally the subjects scored themselves for the play It can be 
shown that A’s minimax strategy in this game is to set x equal to 10 times 
his subjective probability of the favorable outcome 
Two games were played with each pair of subjects In the first game the 
payoff was delermmed by the occurrence of a horizontal or vertical bar, 
as the case might be, in the position of the matrix, selected by the two rolls 
of the 20 face die In the second game, the bet paid off if two successive 
selections of elements of the matnx made with the die resulted in two 


bars of the same type 

Shuford presented the results for individual subjects, but as these 
are too complicated to give here, we content ourselves with some summary 
observations Confirming the earlier findings of Preston and Baratta, 
Griffith, and McGlolhlin, a fairly large fraction of the subjects over- 
estimated low probabilities and underestimated high ones This was true 
for both the elementary and the compound events On the other hand, 
Shuford found that the subjective probability estimates of a number of the 
subjects were fit quite well by a linear function of objective probability, 
although the slope and intercept of this function varied from one subject 
to another His findings about the estimation of compound events are 
particularly interesting, for, to our knowledge, this is the only experiment 
explicitly concerned with this issue A majority of the subjects approxi- 
mated the correct rule, that is, they estimated the probability of the 
compound event as approximately the square of the probability of the 
elementary event That the application of the correct rule was so common 
IS surprising because when the subjects were asked at the end of the series 
of trials what rule they had used, only two stated the correct one Those 
who did not approximate the correct rule came very close to approxi- 
mating the probability of a single elementary event For at least two 
subjects however, no simple rule seemed to account for their estimates of 
the probability of compound events As Shuford remarked, the investiga- 
tion of the compounding rules used in other experimental situations is of 
considerable importance 

"“''y <“ mention a scries of studies that, 
although not exphcilly coueemed with the measurement of subjective 
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probability, bear a rather dose relation to it The common thread in this 
work IS a scale of proportion that is based on the responses of subjects to 
randomly composed arrays of two or more types of elements, usually in a 
relatively difficult perceptual situation The specific references are 
Philip (1947), Shuford (1961a, b), Shuford and Wiesen (1959), Stevens and 
Galanter (1957), Wiesen (1962), and Wiesen and Shuford (1961) The 
three papers involving Wiesen are of particular interest because of the 
emphasis on Bayesian estimation schemes 


4 4 Expenmental Studies of Vanance-Preference, 
XJtility-of-Gamblmg, and Other Models 

A number of investigators have attempted either to show that models 
of choice behavior based only on the concepts of subjective probability 
and utility cannot possibly work, because, for example, of an interaction 
between utihty and probability, or to establish that alternative models, 
in fact, do a better job of prediction We review briefly the studies most 
relevant to the central concerns of this chapter 

Probably the most extensive set of empincal studies that have attempted 
to isolate factors in gambling that are not easily accounted for by an 
expected utility model were performed by Edwards In the first paper of 
the senes (1953), probability preferences in gambling were studied using 
12 undergraduates as subjects who selected between bets of equal expected 
monetary value The chance events were generated by a pinball machine, 
the objective probabilities varied by i steps from i to Edwards found 
that two mam factors determined choices between bets of equal expected 
value First, there was a general tendency either to prefer or to avoid 
fong shots, that is, bets with a low ^ohshthly of wtnoitig or losings arid 
second, there were personal preferences for specific probabilities, the two 
most important being a definite preference for the | probability and an 
aversion to the f probability of winning The existence of specific 
probability preferences certainly raises difficulties for any axiomatization 
of utihty and subjective probability along the lines discussed in Sec 3 
In Edwards (1954a) the methods of the earlier study were extended to bets 
having different expected monetary values The results tended to show 
that subjects’ choices were influenced by preferences for bets with higher 
expected values (or lower negative expected values) as well as by prefer- 
ences among the probabilities Edwards (1954b) replicated and extended 
slightly the findings of the two earlier studies In particular, one expen- 
raent showed a similar pattern of preferences among probabilities in a 
nongambUng game that involved uncertainty The exact details of the 
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game are too complicated to describe here, suffice it to say that it was 
conceptualized to the subjects as a strategic game of military warfare in 
which they were to act as the head of an amphibious task force whose 
mission was to destroy ammunition stored at one of eight ports 
Edwards (1954c) asked whether or not variance preferences exist in 
gambling Using the pinball game mentioned, some preferences for 
variances were exhibited when the probabilities were held constant, but 
on the whole these seemed to be secondary in importance to probability 
preferences The same pattern of probability preferences found in Edwards’ 
earlier work was again evidenced 

These studies of Edwards are essentially negative in character in the 
sense that no well formulated alternative models are set forth to account 
for the experimental findings Moreover, most of the experiments involved 
the assumption either of objective probabilities or of utilities linear with 
money, and, therefore, a more detailed critique is necessary to refute the 
subjective expected utility model On the other hand, some of Edwards’ 
^ preferences are so striking that it seems clear that no 

sitnple SEU model can adequately explain his findings 

of variance preferences The 
hfbas s no? “i f ^ ’"^'viduals make choices on 

outcome," Th S' >>"' olso on the dtspersion of the possible 
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Xor that ' ^ “f ‘ho “American 
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tandeT Aul “‘"“y Tho issues here are thoroughly 
totalv 2 ml the®n of possible classifications can be devTsed 

.1! the ™iance‘^^f 1“^ ‘a' “>0 objective distribu- 

racan and variance of the 'obiKt''^ distribution, a linear function of the 
mean and variance of the subieruv ‘*'*'"'’"“ 0 "’ “ '‘near function of the 
constructive approach is to no'^t i etc , perhaps the most 

probabilities and to compare that objective 

tack was followed bv Conmhs ^ ^ n ^^pected utility model This 
alternative to the expected utilily"mode"‘'that^^'’' '“Sgested, as an 
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probability distribution over’ mon«"”'Th."‘' of its objective 
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utility theory is faulted when oarai^i ' a owever, this criticism of 
When parametnccardinal utility functions are used 
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For example, a polynomial utility function with three degrees of freedom 
would make a natural comparison to their function of expectation, 
variance, and skewness 

They conducted a study with 99 undergraduates who chose among bets 
having a constant expectation but different skewnesses and variances 
A typical choice was between 

Bet j4 ^ to win SI 40, | to lose 70(^, 

Bet M ^ to win SI 00, J to lose SI 00 

All bets had an expected monetary value of 0 From the analysis of their 
data, Coombs and Pruitt drew two principal conclusions of relevance here 
First, variance preference orderings were limited almost exclusively to 
those that could be generated by unfolding the natural ordering of the 
variance scale, using the unfolding technique described at the end of 
Sec 2 4 Approximately one-lhird of the subjects preferred low variance, 
one third high variance, and one-third intermediate degrees of vanance 
Second, the great majority of subjects had their ideal positions at one end 
or the other of the skewness scale 

Pruitt (1962) extended the Coombs and Pruitt approach to a pattern 
and level of risk (PLR) model According to Pruitt’s definition, two bets 
have the same pattern if one may be obtained from the other by multiply- 
ing the outcomes by a positive constant Thus, the following two bets 
have the same pattern 

I chance 10 win $1 00 
A J chance to lose 75^ 

I chance to lose 60<j! 

^ chance to win $3 00 
S J chance to lose S2 25 
J chance to lose $1 80 

His definition of the level of risk of a bet is the sum of its negative out- 
comes weighted by their respective probabilities of occurrence For 
instance the level of risk of Bet A is 35 cents, and of Bet S, SI 05 
Pruitt argued that these two aspects of bets are the ones that people 
usually perceive, and he was lead to define the utility, u(A), of an Alter- 
native or Bet A by the following equation 

u(A) = r(A) p(A) -t- g(A), 

where r(A) is the level of nsk of A, p(A) is the utility of the pattern of A, 
and g(A) ts the utility of risk of A Pruitt did not attempt to derive this 
somewhat arbitrary equation from simple behavioral assumptions 
analogous to those considered m Sec 3 Rather, he denved several 
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consequences of this equation combined with some Coombsian assump- 
tions about the individual possessing an “ideal” level of risk. Typical 
examples are the following two propositions which he tested on the 
Coombs and Pruitt (I960) data mentioned previously and on the data 
from a similar experiment with 39 undergraduates as subjects The 
first proposition asserts that the order of preference among patterns is 
independent of the level of risk The second proposition asserts that the 
more preferred a pattern of risk, the higher will be the ideal level of risk 
for that pattern Both propositions are firmly supported by the data from 
the two experiments Pruitt’s article is also recommended to the reader 
because of its useful and incisive review of the strengths and weaknesses 
of various expected utility models, all of which have already been 
mentioned in this chapter 

A specific utility of-gambhng model has been proposed by Royden, 
Suppes, and Walsh (1959) The behavioral postulate of the model is that 
individuals choose among options so as to maximize the sum of the 
expected monetary value and the utility of gambling For bets or options 

each hrt,r pfobability i for each outcome, 
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the subject attending to different aspects of the choice situation Were it 
possible to specify what it is that causes him to attend to one aspect 
rather than another aspect, a deterministic model might very well be 
appropriate As long as these causes are unknown, we may very well 
have to be satisfied with probability models 
In contrast to algebraic utility theory, which largely has not been the 
product of psychologists, probabilistic theories have had a long develop- 
ment in psychology and only recently have begun to be welded together 
with the algebraic utility ideas The Fechner-Thurstone development of 
probability models for psychophysics and psychoacoustics corresponds to 
what in the present context are called strong and random utility models 
The recent history of developments in probability models is rather 
tangled, but much of it can be recovered from our references to specific 
ideas and results 


The probability models all assume that when X is presented on trial n, 
there exists a probability that are X is chosen The models in 

this section include the added assumption that these probabilities are 
independent of the trial on which x is presented, so the subscript n is 
suppressed The assumption that these are probabilities and that a 
choice IS made at each opportunity is summarized by 


P vC®) ^ 0, for all x€ X, 
2 Px(^) = 1 


(17) 


toll, 'I-''' probabilities are estimated by presenting repeatedly 
among presentations of a number of other 

semM c’ “‘“c of times s: is chosen when X is pre- 
sented is taken as the estimate of p^(x) ^ 

the problem IS to discover what mathematical constraints 

tl edrv nZ°nT,‘’"c‘''' P™‘’=»»bt.es beyond those automatically 
r a a proposal, and 

proposes aetu hv ?' 8, ,s to decide how^ccurate the 

describe the nrohih ll ^ ""mber of the theories do not attempt to 
two elemlm felf 1 ‘ ^“bsets of ,4 but just for the 

write p(i, y) for preference theories it is convenient to 


5 2 Constant Utility Models 

the mluitfvc idea'' of *repreSmR''the?t''''''''th'''r'^ inadequate, 

numerical utility function-, „ terms of a®ub^'vfs'cTir--!,''s‘mrhfoo 



OENERAI. PROBABILISTIC CHOICE THEORIES 333 

least one case it can be dispensed with easily 

probabilities and utility scales relate. FF , , , ^ fixed 

L constant utility models discussed here he 

numerical function over the ^ relevant outcomes The choices 

some function of the scale response probabilities In 

are assumed to be governed direc y y models of Sec 5 3, the 

the other approach, d=fi“d by ‘h'l J^d^, j^ermined on each presenta- 
utihty function is assumed to be , J , unequivocally determined 
tion, but once selected the subjec s i„,.),ra,c models Probabilistic 

by the relevant utilities just as in ^ ® j function, not from the 

behavior arises from the randomness of the u t y tunc^^^ 

decision rule We first ^ i„ation of the fundamental 

Perhaps the weakest « “f^tthat s.mply says that a 

decision rule of the algebraic mo e ^ 

tendency to choose a: over y exists if a y 

than the utility of y This we ^ of binary preference 

I reaUabieCfunenon w over . sncH 

x,yeA * 18 ) 


that 


> i if and only if »<») S '‘W’ 
p(x, y)^i J difficult to see that 

If no other requirements '"’P'’^ „p to strictly monotonic 

IV must be an ordinal scale, tna 

increasing transformations jes us with a weak binary utihy 

Every algebraic ‘"eory of utffi y p it a:, ye A. 

model, and conversely, by means o 
then 3 - > j, if and only if ^ ® 

A set of axioms on S that ’'be^immeto 

real-valued, order preserving uti y probabilities 

restated into an equivalent set ,,ale stronger than ordinal then 

using Def 16 If the axioms o" S y ^^5 a scale of exactly h 


76 if the axioms on S y-'das- = 
clear“ly the corresponding weak utihty m d h ^ 

same strength Thus, for 2 3j, then the equivalent axioms 

axiomsorfuceandTukey 

for binary probabilities yield a p that 

exist real-valued functions «. on it, „,„j + „,(T)2;n.(6) + ''=<!'>' 

pHn..)Ab.y)i^i If and only If ».«■)+ ^ 

and H, and. are interval scales wirhacommon uni. 
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Next in leve\ of strength is the Fechnenan model, most familiar from 
classical psychophysics (see Chapter 4, Sec 2) 

Definition 17 A strong (or Fechnenan) (binary) utility model ts a set 
of binary preference probabihues for Mi there exist a reaUalued 
function u over A and a cumulative distribution function 4> that 


(i) <^(0) = i and 

(ii) for all X, ye A for i\hich p{x, y) ?£ 0 or I, 

y') ~ 


It IS clear that by an appropriate change in u can be transformed by 
positive linear transformations without affecting the representation, and 
it can be shown (Block & Marschak, 1960, p 104) that these arc the only 
possible transformations— that u is an interval scale — when is a 

continuum and ^ is strictly increasing 

Theorem 19 Any strong (binary) utility model m which the preference 
probabilities are different from 0 or 1 « also a iieak utility model, but not 
conversely 

PROOF Suppose that p(x, y)^^, then the monolomcity of ^ and 
^(0) ass \ imply, by (lO of Def 17, that m(x) — u{y) ^0 So « is the weak 
utility function 

The failure of the converse is easily shown by examples 
Although the idea that strength of preference, as measured by the 
binary choice probabilities, should be a monotonic Function of a scale of 
strength of preference, of utility, is an appealing one, examples have been 
suggested that make it dubious as a general hypothesis The following was 
offered by L J Savage® as a ciilicism of the next model to come, but it 
applies equally well to the strong binary model 

Suppose that a boy must select between having a pony x and a bicycle y 
and that he wavers indecisively between, them According to the strong 
utility model, u{x) must be approximately equal to u{y) The bicycle dealer, 
trying to tip the scale in his favor, suddenly brings forth another bicycle z 
which, although basically the same as y, is better in minor ways, such as 
having a speedometer One can well believe that the boy will still be 
indifferent or nearly so, between x and z The minor differences between 
the two bicycles do not help much to resolve his indecision between a 
pony and a bicycle— any bicycle within reason According to the model, 
we are forced to conclude that the utility difference between the two 
bicycles, u(i/) — u(z), is small when evaluated m terms of preferences 
relative to x ^ 


* In correspondence with Luce 
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Nevertheless, were the boy 

choice to the two bicycles, he mig ^ ^ difference 

for one over the other, forcing os to this can 

nfe) - n(e) is not small. ““nivemlSpk suggest that it can-then 
happen— and the introspections ,n describe these preferences 

A such that for all x, ye Afar whtch p(x, y)^Oor 1, 

^ ( 20 ) 
v(x) + i<y) 

Ti, .. «7”> ”f “ 

constant, that is, it is a ratio e investigated by a number of 

The strict binary Becker, DeGroot, & Marschak 

authors Abelson & '■'’.igQy Bradley (l’954a,b, 1955), Bradley & 

^‘^S1l9rFtrr09t)!Sse“ 0 ^-^ " 

but not conversely , , r„„ction Define w = log a + 

rrbe^l^-— 


p(^. yi - T+Ky^ 


= <t>Hx} - «(»)) 

The failure of the converse is easily n" '’Ljuiiar' features The first, 

tions, larger presen possible " n and 1. nothing 

an adequate theory nrobabihlics different fr others or to 
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or 1 models are suitable idealizations to reality, on a par with the contin- 
uous idealizations to discrete phenomena m physics, and so there really 
IS no problem Others, for example. Block and Marschak (1960, p 122), 
have cited simple economic examples where small changes m one com- 
ponent of a commodity bundle change a choice probability from about 
^ to 1 in a highly discontinuous fashion Many feel that these apparent 
discontinuities are a deep problem m the understanding of preferences 
A generalization of the strict binary model has been suggested that 
attempts to overcome both these problems (Luce, 1959) In the original 
presentation, the 0-1 difficulty was dealt with in what, to some critics, 
seemed a distinctly artificial manner, however, as was shown in Chapter 
4, Sec 3 2, this apparently artificial formulation is implied by the following 
simpler assumption 

Definition 19 A set of preference probabilities defined for all the subsets 
of a finite set A satisfy the choice axiom provided that for all x, Y, and X 
such that X s Y Z X ^ A, 

whenever the conditional probability exists 

V/hat the choice axiom says is that a choice from Y is just that, inde- 
pendent of what else may have been available That is to say, even though 
X may have been presented, if we look only at those occasions when the 
choices were made from y, then the probability of choosing x from Y, 
1 Y), is exactly the same as the probability of choosing x from 
pYi^)i when only Y was presented m the first place 

Theorem 31 If the choice axiom holds and A is finite, then there exists 
a ratio scale v on A such that for any /?y(i) ?£ 0, 1, 


py{x) « 


p(a) 

Ky) 


( 21 ) 


PROOF Equation 24 of Sec 3 2 of Chapter 4 (p 221) 

We sec that for non 0 or 1 probabilities, the choice axiom generalizes 
the strict binary utility model, indeed, the property asserted m Theorem 
31 has been called the strict utility model by Block and Marschak (1960) 
Of course, the pony bicycle example discussed is applicable to the strict 
utility models because they arc special cases of the strong ones Neverthe- 
less. It IS instructive to see how it violates the choice axiom [Substantially 
by Drtieu (1960b) m his review of Luce 
(1959) 1 Let « again denote the pony and let y and e be two neatly 
Identical bicycles with compensating features so that the boy is indifferent 
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between each of the pairs, hence ^ ^e^lTy^ne 

^ fX"'ro i'wU 

the pony is rejected , ,o the observa- 

The argument “"''Tthat we cannot expect the choice axiom to 

tion (Luce, 1959, pp 132-133) t manner into two or 

hold for over-all decisions ^ t,, 3 t such criticisms although 

more intermediate decisions pP ^ sweeping 

usually directed toward specifi • . jj^ey suggest that we 

object^ns to all our current preferen^ .^dea^ng with prefrences until 
cannot hope to be set of outcomes that, for 

we include some mathematical structure over 

example, permits us to ^ 

substitutable for one anot “ “ ^ among the outcomes seem to have a 
Such functional and logical rel ^ l,,,gs and they cannot long be 

sharp control over the preference probabilities y 

Ignored 


5 3 Random Utility Models 

The random utility They 'arfthe same to this 

models than are the the outcLe that has the largest 

extent the subject is assumed to choo^ th^^^ ^ o 

utility value at the time °f ^hmce y a un ity 

longer assumed to stay put f* (ulated probability mechani m 
function is selected according ^ by it The most familiar 

and the outcome is ‘‘='""""'^3 “'?e s Thmstone’s model m which the 
psychological /irvanate normal distribution functions, 

mom c?rplerm«hanismsjre^P^^^^^^ a 

In the “^^"'“"beV finite set “f "“"“';^"a\"dom variable 

rnrn rfi^d o"n . such ‘h^ 

Then we call U a random ,be random vanables 

/•« ^ t « e j 


ther assumptions sw 

PrtUC*) S; U(!t), y s -]_f 
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If the random variables arc independent, then the right-hand side simplifies 
to 

PrtUW = t] n ^ '1 <" 

J-ao ircl-W 

Definition 20 A random utility model is a set of preference probabilities 
defined for all subsets of a finite A for winch there is a random lector U 
on A such that for jc e y S >4, 

Pj (i) = Prtu(x) S: U(!/), !/ e 1 1 (22) 

If the definition is only asserted for the binary preference probabilities, 

p(x.») = mU(x)^U(y)l, 

then the model is called a binary random utility model Jf the random 
vector U consists of components that arc independent random variables, 
then we say the model is an independent random utility model 
The primary theoretical results so far obtained about random utility 
models establish relations with the constant utility models, which we now 
present, and with certain observable properties, which we discuss in 
Sec 5 6 

Theorem 32 For choice probabihues different from 0 and 1, any strict 
utility model is an independent random utility model, but not conversely 
PROOF ’ Suppose that v is the scale of the strict utility model Define U 
to be the random vector whose components ate independent random 
variables with probability densities 

if/^0 


?r[U(a:) = /] == 


Consider 

Pr[V{x)-^Viy),yeY] 


lo. 


if f > 0 


TT Prmy)<t]di 

r-M 


= J/'-Wx) = ,] 




_ ii(x) 

iKy) 




that It IS constructive and that it is a good deal shorter 
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Note that any monotonic transformation / of Uin the preceding example 
again provides a random utility interpretation of a given strict utility 
model provided that we replace t by/~V) the nght side of the defining 
equation It is conjectured that these are the only reasonably well- 
behaved examples, but no proof has yet been devised In any event, just 
about any random utility model one cares to write down is not a stnct 
utility model, so the failure of the converse of the theorem is easy to 
establish 

Although certain other relations exist among the random, weak, and 
strong utility models that are easy to work out directly, we shall not do so 
because they can be deduced from later results using only the transitivity 
of implication It is sufficient to note the results here 

There are (binary) random utility models that are not weak, and there- 
fore not strong, utility models 

There are strong, and therefore weak, utility models that are not random 
utility models 

The question also arises how the strong and the random models are 
related when both are binary No complete answer seems to be known, 
however, the following is of interest 

Theorem 33 Any strong utility model for which the distribution function 
is the difference distribution of /» o independent and identically distributed 

random variables is a binary random utility model 
PROOF Let u be the strong utility function and define the random 
variable Ufa:) = u{x) + c(a;) where for x^y, cfx) and e^) are inde- 
pendent and identically distributed and ^ is the distribution function of 
€(r) — e(y) Then 

^ Tim = ^ uip) -h £(y)] 

= Tr[e(y) — c(a:) ^ u(x) - u(y)J 
~ — u(y)] 

= y) 


5 4 Observable Properties 

None of the models just descnbed has ever been tested experimentally 
m any complete fashion Indeed, save for the choice axiom, they are all 
stated in terms of nonobservable utility functions, and so it is impossible 
to test them completely until we know conditions that are necessary and 
sufficient to characterize them in terms of the preference probabilities 
themselves, for only these can be estimated from data. No results of this 
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generality are now known, and so experimenters have been content to 
examine various observable properties that are necessary consequences 
of at least some of the models 

In this subsection we list several of the more important properties, 
although not all that Marschak and his colleagues have investigated 
theoretically, in See 5 5 we cite some of the interrelations among them, 
and in Sec 5 6, the relahons between them and the several models 
are described 

Consider three outcomes a:, y, and 2 and the corresponding binary 
probabilities p(a:, y\p{y. 2), and/»(a;, 2) If we think of these as represented 
in the plane by arrows of lengths p{x, y), p(y, 2), and p{x, 2), then it is not 
unreasonable that we should be able to construct a triangle from them 
That IS to say> the length corresponding to the a;-2 pair should not exceed 
the sum of the lengths corresponding to the x-y and y-2 pairs If so, 
then the following property must be met by the data 
Definition 21 A set of binary preference probabilities satisfies the triangle 
condition if for every x,y^zQ A, 

P(=2, t/) + p(y, 2) ^ p(«. 2) (23) 


Marschak (1960, p 317) attributes the triangle condition to Guilbaud 
(1953) It IS in many ways the weakest property that has been suggested 
The next group of ideas are all attempts to extend to the probabilistic 
situation the notion that preferences arc transitive The term “stochastic” 
used in the definition is inappropriate (see the discussion in Sec 1 3), 
but It is so ingrained in the literature that we make no attempt to change it 
here 

Definition 22 Whenever min [p{x, y),p(y, 2)] ^ the binary preference 
pro&abj/jijes are said to satisfy 

(l) weak (stochastic) transitivity provided that 

P(x,z)'^\, 

(ii) moderate (stochastic) transitivity provided that 

p{x, z) ^ min lp(z, y), p{y, z)] , (24) 

(m) strong (stochastic) transitivity provided that 


Pi^>^')'^rnox[p{x,y),p(y,z)] (25) 

Marschak (1960, p 319) attnbutes the definitions of weak and strong 
ZTTV:' Valavams-VBil (1957) and Ihal of moderate (sometimes 
called mi/fl trausitivity to Georgescu-Roegen (1958) and Chtpman (1958) 

^ L I'T “ """‘■P'>“"ve condtlton on tuples of b.naty 
probabilmes which Momsou (1963) mod.fied slightly by requir.ng the 
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same hypothesis as in the stochastic transitivity conditions. Wc adopt 

Morrison’s form of the definition. „,„hnhihiies salisfv the multi- 

Definition 23. A <>/ r 
plicative condition if for every 

p^y.tt)] k h to Davidson and Marschak 

We turn next to a binary prop y, already been discussed 

(1959), which involves four elements and 

for algebraic models m Sec. 2- • monotomcity 

model. Suppose that that model ho x ^ _ „(z), which 

of Pdv. .) ^ P(y. is eq-valen to eValent 

in turn is equivalent to w(w) Oj) ^ consider the following property, 
to p(tv, y) ^ p(?:, 2 ). Thus we are le j satisfy the quadruple 

Definition 24. A set ofbinaryprf^ttnceprobabM es H 4 

condition provided that p(w, x) S Luce'(1959). It says, m 

Our final binary property 

however, the following result, which was po 

%‘:r:;Wypre/rrenrepro« ronr/er ,*rprod.« r.,r. I^rofir ovy 

n ^3 distinct elemenlsxt,x,...,x. , ( 27 ) 

By the induction hypothesis. Eq 27 can DC 

■ ,-WT..w...Wr’..»’ri) , 

_j) . . ./K*!. *iV^**» 



by the product 
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Tig 5 A summary of the implications (-*) and failures of implication (H) among 
models and observahle properties on the assumption that the probabilities are different 
from 0 and 1 and that the set of all alternatives is finite The number beside a line 
indicates the relevant theorem The relation between any two concepts can be deduced 
using only the transitivity of implication from these relations plus the fact that a binar) 
model or property never implies a nonbinary one (sec text) 
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hence p{z, y) > h There are, therefore, two possible (nonvacuous) cases 
of the multiplicative condition to consider. 

(,) p(x, z) ^ i, in which case our hypothesis implies 

p(x, z)p{z, y) > W, y) + pfH’ y'f 

= p(=z. y)[l - Pfe. ^)] + P^’ 

= p{x, y) + p(y. y) ~ p^^’ 

'^p(x,y), 

, \ ^ 1 nr.i zt = p(z. v) Since this con- 
because, by hypothesis, p(ir, !/) < J j^\he other possibility, 

tradicts the multiplicative condition, we turn to me 

(ii) p(y, x) ^ J, in which case we have 

pCz,y)p(y,z:) = [l-/’(y,^)]P<y’^) 

> [1 + 

= [p(z, x) + p(z:, >1)111 - p(t, y)l 

= p(^, + P(*. ~ 

= p(z, ») + y)ip(^’ 

^p(z,x:), 

r x(v z) > 0 Since this contra- 

because by hypothesis p(x, e shown that the triangle con- 

dicts the multiplicative condition, we have 

dition holds 

The converse does not hold, for example, when 

p(x,y)=p(y,^)=p(^’‘’’~^ , t 

Theorem 36 IVeither the mn/t-pheo-we eonditmo nor weoAi stocto 

PR— 

0 36 satisfy the multiplicative condition but^ ^ ^ ^ „ g 

Theorem 37 Strong s/r/c/fy stronger //lan neak transiOifty 

iransitwtt}', n/ncfi m turn ^j^mcdiately from the deemtions 

PROOF. The implicatio ^ gas,iy shown by simple examp 

failure of the o Errr‘’T;rcrHe n.n/<-p/-«me 

Theorem 38 Mouei^* 

cond,t,o„, but not ‘oni^ P- moderate transitivity implies 

PROOF. irmin[p(-.!/).P(y'"«^‘- 

p(x, z) ^ nun {p(x, yh pilt* )i 

^ jTfx, !/)p(y. =) 
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The converse is false, for if it were true, then by Theorem 37 the multi- 
plicative condition would imply weak transitivity, contrary to Theorem 36. 
Theorem 39 The quadruple condition implies strong stochastic transitivity, 
but not conversely 

P^^OOF Suppose that p{x, y) ^ then since p{z, z) = ^ for any zeA, 
the quadruple condition implies p{x,2) piy,z) A similar argument 
using piy, 2) ^ i = p{x, x) establishes strong transitivity 

The converse does not holdbecausep(a:, 1/3 =;>(x,s) *= iandj)(i/,z) = § 
satisfy strong transitivity but not the quadruple condition* 

p{x, y) = J = p[x, z), but p{x, a) = ^ < p{y, z) = S 

Theorem 40 For binary probabilities different from 0 and 1, the product 
rule implies the quadruple condition, but not conversely 
PROOF Suppose that p{w, x) ^ p(y, z), then rewriting the product rule 
on {w, x,y,z], 

p{z, x) p(w, y) _ p(w, x)p{z, y) 
p{x, z) p(ij, w) pC*. w)p{y, z) 

^ 1 , 

from which it follows that p(w, y) ^ p{x, z) 

Simple counterexamples show that the converse is false 


5 6 Relations between the Models and the Observable 
Properties 


Following the plan shown in Fig 5, we next establish the basic con- 
nections between the models and the observable properties We begin 
with the weaker models and work up to the stronger ones 
Theorem 41 Any random utility model is regular, but not conversely 
PROOF Let a e 3: c r, then by Def 20, 

p^{x) = PrJUfaj) > U(y) y G 3r £ Y] 

Pr[Vix) -^Viy^y G 7 ] 

= Pri^} 


4/of'sec°6V'”“ 

"“'"y satisfying the 

niuUiphcatne condition implies the Other * 
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PROOF. Consider A = {x, y. r}. let e be a ^ 

and let U = <U(t), U(!/), U(e)> be the random vector with the follow g 

distribution 

,fu= <3,2,1) 

1 -e 


Pr(U = u) = 
Observe that 


if u = <1,3, 2) or <2, 1,3) 
Otherwise 


p(ir.y) = P'-[U(^)^U<!/)] 

= Pr[U = <3, 2, 1)1 + PfIU = <2, 1, 3)] 


In like manner, 


= € + — 

_ 1 -h < 

2 


p(.y, *) = 


1 +T 


p<*, e) = « 

&11U» ^ 

(■ 2 / * 


Thus 




> 0 , 


and the multiplicative condition « multiplicative condition 
The implication the other way fails because in 
restricts only the model nor satisfying neak 

Theorem 43 Neither being 

stochastic transitivity implies r e '* number such that 0 < « < . • 

Pr(U = u)= (-' + • 

1^0 otherwise 
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Then, 

J)(T, y) = i’r[tJ(T) s U(!/)] 

= Pr(u = <3, 2, 1» + Pi-(u = <2, 1, 3» + Pr(u = <3, 1, 2» 

=«+i+e+|-«+i 

s= € + i 

Similarly, 

p(y< 2) = E + I 

p{x,z) = -£ + i 
Clearly, weak transitivity is not met 

Although It IS not indicated on Fig 5, perhaps it is worth noting that 
by imposing some fairly strong regularity conditions on the random 
utility model it is possible to show that strong stochastic transitivity must 
hold and that under slightly stronger conditions we can obtain an expres- 
sion for the weak utility function, which by Theorems 37 and 45 must 
exist U will be convenient to use the notation 

F(t, y) *= PrWix) ^ 0 - PrWiy) ^ 0 

fit, y) = = PrlU(*) = 0 - Pr[U(!;) = 1 ] 

at 

Theorem 44 Suppose that a set of bmary preference probabilities is a 
binary independent random utility model 

(i) ^ffnr x,yQA, F{t, y) is either nonnegativefor all t or nonpositwe 
for all t, then strong stochastic transitivity is satisfied 

(ii) if, m addition. Urn rF(/,x,y)= lim /F(r, x, y) = 0, tAen 

l—flO l—oj 

.v(t 1 = E[U(i:)] =1" (Pr[U(x) = i] rfi 

IS a Meak utility function 
PROOF We first note that 


K*. 2) - p(t, v) = p(y, X) - 


= J'^PrlUW = I]F(1, 2 


B^ause ^t,z,y) has the same sign for all / and FrlUf®) « r] > 0, it 
follows that p(z, z) — p{x, y) has the same sign as F(r, x, y) 
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(i) Suppose p(x, y)'^i and p(,y, Thus 

0 ^p(x,y) - i= p(?> y) — 

0 ^ p(!/, z) - i = - P^' 


and so Pit, y,x)'^0 and f(t, 
and pix, z) — p(y, z) = p(z, y) 


2. ^ 0 Therefore p(x, z) — pix, y) ^ 0 
-piz,x)^0, which proves strong sto- 


chastic transitivity 


(ii) Consider 

iv(x) - wiy) = P[Uix) - U(!i)] 

= J tfit,x,y)dt 
= rF(t,ir,!/)|"^-£/«.zi.!/)d< 

^ 0 if and only if — F(<. z, V) — F. z) < 
where we have integrated by parts and used the hypothesis of part (ii) 

But ^ . „ 

_ i = p(*. y) - p(:z. z) ? 0 If and only if F(r, y. z) < . 


which proves that 

Pix, y) S h °"'y 'f ^ 55) 

It should be noted that Thuza'®"'’® ^ ^ variances, 

which assumes independent normal di^nbntions w 

satisfies all the “"‘‘itions of this theorem tran^irnty 

Theorem 45 A weak ulMy mode ^ 

The converse ,s not true in general,^! ^ 

PROOF If pix, y) S i ^ ,.<y) ^ ..<z), and therefore 

there is a function iv such that \ ) d: ^ 

pin:, y) ^ i j. , consider points in the plane, - 

To show that the converse is la , 
ix„ y,) and for «, ^ > i. “ 

{ oL if z, > yi 
a ,rx =!f. and z, > Pi 
} if Zi = tft and zj - Vt 

we first Show that these P— rar'-r'-ntiv:^^ 

tivity and, here are nine cases to consider, wc look at 

Pix, y) S i and pto. z) 2 t- 
only three because the res 
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1 !/) = *= > S'* > “d Z) = <X 

2 ;(T, 1 /) = a, pip, Z) = 'hen > », = z„ so p(x, z) = a 

3 />(*. !/■) = /5, p(y, Z) = i, then a:, = iji. Tj > y., Vi - Zi. and y^ = Zj, 
so Ti = Zi and Zj > Za, hence />(a:, z) = ^ 

Now, the relation ^ dehned by r ^ » if and only ifpix, y)'2.h holds if 
and only if x, > oi x, = »i and x, ^ y^ Thus, if these probabilities 
•were to satisfy the weak utility model, there would have to be an order- 
preserving function of this relation, contrary to what we showed m Sec 
2 1, but when A is finite, it is immediately evident that an order-preserving 
function exists 

Theorem 46 Neither being a weak utility model noT satisfying the triangle 
condition implies the other 

PROOF For A (x, t/, z] and p{x, y) = 0 1, p(y, z) = 0 6, and p{xy z)= 

0 8, the weak utility model holds with iv(x) = I, iv(y) = 2, and w(z) = 0, 
but the trianglecondition does not because />(x,y) •+• p(y,z) = 0 7 < 0 8 = 
p(x, a) 

If the converse held, then we would have the chain 
multipUcative triangle — ► weak utility weak transitivity, 

contrary to Theorem 36 

Theorem 47 A strong utility model satisfies the quadruple condition, but 
not conversely 

PROOF K proof of the first part was gwen m the discussion leading to 
the quadruple condition 

Debreu (1958) showed that the quadruple condition does not imply the 
strong utility model in general, but that it does when the following condi- 
tion of stochastic continuity is met for every x, i/, z, e A and every real a 
such that/?(x, y) < a < p{z, z), there exists n we A such that p(x, w) = a 
Theorem 48 A strict utility model satisfies the product rule, but not 
conversely except in the binary case when they are equivalent 
PROOF To show that being a (binary) strict utility model implies satis- 
faction of the product rule, substitute Eq 20 (p 335) into both sides of Eq 
26 (p 341) and cancel the denominators The converse, which is not true 
mgeneral.istrueinthebinarycasc Fora G^,defineu(x) p{x,a)}p{a,x), 
then from p(x, y)p(y, a)p(a, x) = ;>(x, a)p(a, y)p^y, x), we see that 

p{x, y) _^P(g. Q)/p(q. x) 

P(*» u)/KUi *) + p(y, a)lp(a, y) 

t(») + t(y) 
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6 GENERAL PROBABILISTIC RANKING 
THEORIES 


between theories ofchoice and of ranking ini 1 ^ 

.La 

* ssS.: 

the possibility that decisions may be gove ^ longer at all obvious, 

connections between choices and f "'‘'"8 "°„rmagn.tude to try to 
It IS a theoretical and experimental problem of some magnitu 

find out what they are ^ „ knowledge 

Very little theoretical work has is due to Block 

there are no „ ^ 8 . 74 ) and Marschak (I960) 

and Marschak (1960), Luce (19 , p , ^al problem have been 

Two general modes ° 

ideas that have been put forward 


6 1 A Function Relating Choice to Ranking Probabilities 

We begin with a simple ““‘"P'' jqu„ed to rank order the three 
lented many times and that the su j ranking probabilities p(iry=). 

outcomes This gives us ”h,eii f is first, 

the choice probabilities l 
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With probabitity i, and they ate ordered according to the relevant binary 
choice probability One of these two outcomes is then chosen at random, 
that is, with probability and it is compared with the third outcome 
according to the relevant binary choice probability This may or may not 
produce a ranking of the three outcomes, if not, the two not previously 
compared are now ordered, which produces a ranking of all three It is 
not difficult to show that 


P{x, y) - 


pjxyz) — pjzxy) 

p(xyz) - p(zxy) + p{yxz) - p(zyx) ’ 


which IS obviously different from Eq 28 

Nevertheless, the idea embodied in Eq 28 seems worth exploration. To 
do so, some notation is needed 

Let A denote the finite set of all n outcomes, and let R denote the set of n ! 
rankings of A Thus p si? is a particular ranking of A Denote by pi 
the element of A that is ranked in the ith position by p, and denote by 
p(x) the rank order position ot xg A under the ranking p The set of 
rankings for which ;c g ^ is placed above all the elements m 7 s .<4 is 
denoted by R(x, T), that is, 

Rix, y) = {p 1 P e i? and p(«) > pfy), all y^Y— {x}j 

Block and Marschak (1960, p 107) proved the following interesting 
result 

Theorem 49 A set of preference probabilities py, Ys A is a random 
utility model (De/ 20, p 338) if and only if there exists a probability 
distribution p over the set R of rankings of A such that 

pA^) = 2 Pip) (29) 

PcBl* D 

PROOF Suppose, first, that the preference probabilities satisfy a random 
utility model Define 


then 


Pip) = Pr[U(pi) > U(p2) > > UCp„)], 

Py(x) = PrlUCa:) > VivX y g r - {x}] 


= 1 pip) 

petilx T} 


> u(p,01 


VV.A X / nfT . , correspondence 

end Ihnt R .s the set of pemtutations of ordered by magnitude Now, 
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for any vector u = (wj, W 2 > 
random vector U by 


«„) with real components «(, we define the 

Pr[U = «] = I 


’ p{ti) if w e 
0 if K ^ i? 


It IS easy to see that 


Pi.W= I M“) 

= 2 Fr[U 


= «1 


= Pr[U{a:) > U(!i), !/ e ^ - {*}! 

Corollary ^ moiel ,ha, sausfies .he regulan.y conimon need no, be a 

random Mb, y model j^seefrornEq 29 that a necessary 

PROOF ForX = ll.A 

condition for the random utility mode! to hold is tha 

p„a,(2)+Pa(»-/’» = «®"/’'”“® 

= p(1423) + p(41231 

SO 

Let 0 < « < i and let 

PaO) = e 

Pa0) = a-‘)/3. . = 1.3,4 

a if S has 3 elements 
“ li if S has 2 elements 

ft ts clear tha. these choice probabilities satisfy regularity^but that 

p,, 3,(2) + Pa(2) - Pet a " ''*= ’ < _i + J 

= 0 , 

„J „ th, "•“> '■ “* 



1.1 rtf how choice and ranting pro 

And apm there is 
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might rank order a set of outcomes by first selecting what strikes him as 
the best outcome and assigning it rank one, this outcome is then discarded, 
and he selects the best outcome from the remaining set and assigns it rank 
two, and so on, until the set IS exhausted Once staled, this seems hkc the 
only sensible mechanism, but there are alternatives For example, he 
might simply reverse the procedure, first select the least satisfactory 
outcome and rank it last, then the least satisfactory from the remaining set 
and rank it next to last, etc The relation between these two procedures is 
considered shortly Or he might proceed as in the example of Sec 6 1 by 
choosing a pair of elements at random, order them, and then compare one 
of these with another outcome chosen at random, etc Note that this last 
scheme is prone to give a partial order or an intransitive relation when 
there are more than three elements, but it is not obvious that this is 
descriptively wrong To generate a consistent ranking of numerous alterna- 
tives, for example, of student applications, is not considered particularly 
easy by most people who have ever had to do it 
Recall that if p is a rank ordering of A, then pj denotes the element of 
A that the has ith rank We may formulate the first model proposed 
previously as asserting that for all p 6 

!>(P) = P(p,-i. P») (30) 

Nothing much seems to be known about Eq 30 by itself, but when 
coupled with Eq 29 of Sec 6 I, Block and Marschak (1960, p 109) have 
proved the following strong result, the first part of which generalizes a 
result m Luce (1959, p 72) 

Theorem 50 If a sel of preference probabihues py, Y A, is a strict 
utility model {Theorem 31, p 336), then there exist ranking probabilities p 
such that p and py satisfy both Eqs 29 and 30, conversely, if there exist 
preference and ranking probabilities that satisfy both Eqs 29 and 30, then 
the set of preference probabilities is a strict utility model 
PROOF Suppose that the preference probabilities form a strict utility 
model If we define the ranking probabilities by Eq 30, then it is sufficient 
to show that Eq 29 holds Observe, first, that the set R{x, T) is the union 
of the following disjoint sets R{x, A), the sets of rankings where e 
A - Y\s ranked first and x second , the sets where ZiG A ~ r is ranked 
first, z^eA— y — is ranked second, and x third, etc 

PeKix Y) Y 

+ 

^.,3-1 Prul.,)(ii)Pr(“0 
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For any Z such that y s .f - Z, the stnct utility model implies 

= PrW/l-i-zW 

= py(z)[l-/i^_z(^-Z- 


Substituting these expressions in the previous equation, 
2 P(P) = ^ 

peHlxT) 

X [1 - pa - m (^ - 
, y T 

„€X-r a*X-l-b.) 


+ 2 Px(“l) P' 

= Pi.w[l - Pa(A - >') + P^''^ - 

»i e ^—1 


= Pi (*) 

Conversely, If Eqs 29 and 30 hold, then we have 

p,W = PiW+ 2 Px(=.>P-<-'^^’ 

Pi(=.)P.-i..,WP.-.....,M 


+ 2 


1 tnr r = A. the strict utilit) model 
r^-T^prov^rg^nertll a decreasing induclion on Ihc 

sizc'*or Y. By the induction hypothesis. 

n.W 

p,-t.,lW“p,(X _(:,))■ 


1*1* . . 
p,(:.) Pi ololWPt W 





JJS PREFERENCE. UT.MTY, AND SUBJECTIVE PROBABlUTV 

Substituting these into the preceding equation yields 

, sf, , T Pa(ci) 

...j4-b.iPAM - {^1. * 2 )) J 
+ [ 2 paW pj iPi 

L»ieA4-r J 

= p»ny) + Pi(^)c(^)* 

where F(y) and G(y) are independent of x Rewriting, 

and so 

Pr(=g) ^ Pa(^) ^ 

Pr(*) P/*)' 

which yields the strict utility model if we set v(z) « pj^(x) 


6 3 An “Impossibility” Theorem 

Our final result about rankings conwrns the relation of the ranking 
model embodied in Eq 30 and the corresponding model based upon 
choices of least preferred outcomes We let py denote the usual preference 
probabilities and py the choice probabilities for the least preferred element 
of y Exactly what empirical interpretation should be given to “choosing 
the least preferred element of F ’ is not clear If m estimating py we 
simply ask the subject to select from Y the element he most prefers, then 
it seems reasonable to estimate by asking which element he least 
prefers However, if we estimate py by giving him the element that he 
selects, then what is a comparable payoff procedure to estimate ? 
One possibility is to reward him with an element chosen at random from 
Y — (x) when he says he least prefers x In any event, let us suppose that 
pJ exists 

Now suppose that a single random utility model underlies both of these 
judgments The probability that he gives the ranking p when he is asked 
to order the outcomes from the most preferred to the least preferred is, as 
before, 

pip) = -PrfUfpi) > Ufpt) > > U(p„)l 
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The probability of the ranking p when he is asked to order the elements 
from the least preferred to the most preferred is 

p*(p) = Pr[U(pi) < U(p 2 ) < . . < U(/>n)] 

Observe that if we let p* denote the reverse of ranking p. that is, 

. — n . . . pj = Pi. Ihst 

Pf = P' “ 

P*(P*)=P{P) ’ 

A similar argument based upon Eq 29 yields 

p{x,y)=p*(!l,^), f" ^ ’ 

Finally, let us suppose that Eq 30 holds both for the starred and the 
unstarred probabilities, that is, 

p(p) = Pa(Pi)P^-(p.iW ■ P^r.-.p.)(P"-" 

p*(p) = pl(Pi)PA-lpdP‘> 

Theorem 5, Spppo. ,Ha, .He 

pAx) = p*A-) = ^^’ 

with no loss of generality we may assume 

2 pfx) = 2^P*W = 1 

For any x, y e A, let p be’tot rankmg of A ^or which pi = x. Ps = P- 
and let G be that ranking of .4 for which d - V. 

1 = 3, . , n By Eq 33, 

p(o) p,i(y)Pa-I.lWPa-l» rllPs' 
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By Eqs 34 and 32, 

/(/) _ n'loTln-. ■ ■ • pL-„- 

P*(n*) PA(ni)P*i-(o,*l(^s) ■ • • 

PAC^JPA-Mift^t) ■ ■ ■ P*(^' 

_ p*(y. 

p*(*, ») 

_ p(j, p) 

pfe. 

_ SW 
»(!/) 

By Eq. 31, P*(p*) = p(p^ and P’(®*) = “ 

1 - D(y) _ d(£) 

1 — t<l) ify) 

Since X and y were arbitrary, t{x) ^ constant independent oF x. Since 
p*(ie, y) = p(ll< *). It also follows that i)*(x) = constant independent of x. 

Py{x)-pt^x-)^\l\Y\. 

This resuU is due to Block and Marschak (1960, p. Ul); it gencraliies 
to any n a result proved for n = 3 m Luce (1959, p. 69). 

Clearly, Theorem 51 is an impossibility theorem. Its conclusion is 
unacceptable on empirical grounds; hence its hypotheses cannot all be 
correct assumptions in a theory of behavior. Block and Marschak, who 
did not make explicit in their formal statement of the theorem that they 
assumed a common random utility model and, in particular, that Eqs. 31 
and 32 hold, interpreted Theorem 51 as powerful evidence against the 
strict utility model (or, what is the same, against Eqs. 33 and 34). There 
are, in our view, at least two alternative possibilities. First, one can 
question the assumption (embodied m Eqs. 31 and 32) that a common 
random utility model underlies the choice of the most preferred and of the 
least preferred outcomes and of the ranking from best to worst and from 
worst to best. As has been previously pointed out (Luce, 1959), the 
assumption p(p) = p*(p*) is not obviously correct as a description of 
behavior. Second, one can question whether the whole problem makes 
any sense at all, that is, whether choirs of least preferred outcomes have 
any operational meaning Thus, although the conclusion of Theorem 51 
IS empirically unacceptable, it is not evident to us which of its hypotheses 
should be rejected. 
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7 PROBABILISTIC CHOICE THEORIES 
FOR UNCERTAIN OUTCOMES 

By an experiment that has uncertain outcomes, we nrean one^m which 

knowledge of the stimulus {’^^'^“‘“^“''^“.^rtain” outcome received by 
to determine the actual elementary determine a function to 

random ones in the sense that for x g 

tt(x I to) = Pr[to(n) = 

IS independent of the trial number " outcomes is to say that the 

Another way of ohutucterizing u 

presentation plus the response de one correspondence with 

and exhaustive chance events ‘ta‘ ^ ,s delivered to the 
the set of certain outcomes A P connection between 

subject if and only if the even ^ ^ j between random variables 
theL two ways of speaking is essentially that betw 

and sample spaces is convenient to use the event 

For most purposes of ,£• ,s the event corresponding to 

language and to event^ occurring, and if/is a function 

outcome ir, is the prob y 

defined over Aj,^ /(^.) outcome is simply a probability 

In summary, then an ““^'“"rSome authors speak of uncertain 
tion ,1 over the set A of certain outcomes 
outcomes as wagers or gam es 


7 1 Expected Utility Models 


7 1 Expecieu 

. j,.] with uncertain outcomes is to 

utitity hypothesi5Uv.t^„j5 3 denote m uncertain 

de^enW m Secs - ^ certmn outcomes Of cou^ 

• fined over a set ^ „ thit that outcome Xf cannot 

outcomes dcf>"f cl 0 meaning ^ The defining 

some of the n,' "’“y Tj ^ ,5 selected Let a/ t / 
received by models are the fcllowins 

properties of the sc 
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WEAK EXPECTED DTIUTY There B a real valued function w over A 

SMCft that n fc 

p(7r^7T*)^^ if and Ottltf if 

STRONG EXPECTED UTILITY Xftcre « fl real valued function u over 
A and a distribution function tf> such that 

\<=.i <-i / 


provided that pljr^ ‘rP') 0 or 1 

STRICT EXPECTED UTILITY There IS a positive real valued function 
V over A such that „ 

2 "A 

— 

2 2’^.\ 

k-l <-l 

provided that ps/in^ ^0 or I 

In their presentation of these models, Becker, DeGroot, and Marschak 
(1963a) state a v/eaker form of the strict and strong expected utility model 
that involves two functions, u and t», and assumes that the relevant scale 


values are 



This, however, does not seem to us to be the natural generalization of these 
utility models, so we have formulated them in terms of just one function 
Still another possibility was suggested by Suppes (1961) He considered 
a learning situation in which a subject must choose from a set of 
gambles all of the same form if $ gji/, then with probability 7r(0 the 
subject is reinforced (that is, given a desired outcome) and with probability 
1 — 7 t( 0 nothing happens Using a one clement stimulus sampling model 
(sec Chapter 10, Vol 11), he arrived at a Markov chain to describe the 


transition from one Inal to the next The asymptotic choice probability 
turns out to be of the form 

2 K’l) 

L 

where 


t(0 = 


1 


and B piobabitay JJial the stimulus element is conditioned to 
another response when ? is not reinforced We describe this model (for 
tno responses) somewhat more carefully in Sec 73, here we merely point 
out Its similarity m form to the stnet utility model and the considerable 
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difference m the form of u(s:) from that assumed m the strict expected- 

utihty model .s a random vector V .,hose com- 

= *= = 1 , 2 . ■-) 

The first three models can readdy ^^'‘P^^jf^^probabilities with 
utility models merely by «P'“‘ 6 at least two different 

subjective ones The last one c ^ probability function to replace 

7o\l!;^tCngtfj:crprobah.^^^ 

ri^rm" rhotve. /•! 

because next to nothing is “ . foUowing special, but testable, 

m:rX'uf r'ricf arrmndom expected utility models (Becker, 

which the (m + l)st IS the average of the other m 

mfl 

‘ mi-1 

If ,he strict expected utility model ts satisfied, then 

PivCir”^') = ^T+l 

If, he random expected ut.hty model ,s satisfied, then 

P^(„™+.) = 0 

For the strict expected utility model, we have 


PROOF 
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Foi the landom expected utility model, it is easy to see that equalities 
among the random variables occur with probability 0, so 

Prf(n«‘) = i’r(iirr'U.>i’f*Uo fc = 1, 2, , m) 

\»-l ' 

= J’rrii(irr.'U,Win‘Ul. k = 1. 2. ■ '"I 

= 0 , 

because the average of several quantities can never exceed each of them 
Corollary A strict expected utility model need not he a random expected 
utility model 

At first glance, this corollary may seem surprising in the light of Theorem 
32 of Sec 5 3, but it should be clear that they are not inconsistent 
It is easy to see that a proof paralleling that of Theorem 29 of Sec 5 2 
establishes that a strong expected utility model is a weak expected utility 
model As stated, a strict expected utility model need not be a strong one, 
although this is true for the weaker definitions given by Becker, De Groot, 
and Marschak 

Nothing much else seems to be known about these models, except for one 
experiment based on Theorem 52 (see Sec 8 4) 

7 2 Decomposition Assumption 

In this subsection we confine our attention to uncertain outcomes of the 
simplest possible form, namely, those with nonzero probability on only 
two certain outcomes It js convenient to introduce a slightly different 
notation from what we have been using By xEy we denote the uncertain 
Qu.t.ca«w» fww. 'Nh'/ib. x e A is. xe.cwix'i by E wtvi’is ^-^4 

y c A IS received if £ docs not occur If we let denote the set (actually. 
Boolean algebra) of events used to generate these binary uncertain out- 
comes, then s^ — AxS'XAk the set of possible uncertain outcomes 
Certain outcomes are included in s/ by taking £ to be the certain event 
As v.e have been assuming all along, preference probabilities that the 
subject chooses the (uncertain or certain) outcome C when ^ is presented, 
Pf(Ot C c 5* s arc assumed to exist In addition, we postulate that 
the subject IS able to make judgments about which of several events is 
most likely to occur— he must be able to do this, at least crudely, if he is to 
make any sense of selecting among uncertain outcomes Thus, when 
£:c <? c uc suppose that a probability ^^(£) exists that the subject 
selects £ as the event m !? that is most likely to occur We speak of these 
as judgment prohahUtUcs 



P.OBAB.I,..T.a CHOICE THEORIES EOR UNCERTAIN OUTCOMES 

Suppose that.ir.and.W 

and, independent of this, which ev , r, tn a-Dw m lUSt two of the 

to occur" It IS clear that he should prefer ^ 

four possible cases (ignoring preferred to x and D is 

£ IS ludged rnore likely than A and when Pf 

judged more likely than E If so .tatisticallv independent then it 

outcomes and judgments of event 

,s clear that the following condition must hold (Lu« 

Decomposition Assumption For all x.yeAandE.De 

p(xEy. xDy) = p(*, vHE, D) + p(y. E) ( 

The following trivial '“dgment 

holds, It IS possible to estimate the binary juogmen p 

binary preference probabilities Unlds and if^, 

Theorem 53 If the decomposuwn assumption holds tj 

such that p(x, y) ^ i, then 

^r.r„ ■rnutnlx.y) - =‘EV)P(y i— 

q(E, D) = ■ p(iii, yf - T(y> 

The primary criUcism of the 

of relative likehhood may interlocking the outcomes with 

Certainly, this can be arrange y g following example Let x “md y 
the events L J Savage- suggested ^ follow^ and iJ = £ the 

be two greyhound dogs, E the ev between them 

compleLntary event that y '^hich one depends upon which 

Thu; you will receive one f wins race If you propose to race 

gamble you choose and which d<>S ,hat m the absence of other 

the dog received in the f'd“'='X„,d -refer the dog that wins the mce 

=;'E==”jr— 

=ESi-r— 

” In correspondence v-tth 
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Of Irwin and his students (Irwin, 1953, Marks, 1951; and unpublished 
data) In the simplest, and most relevant, version of these experiments, 
the subject predicts whether or not he will draw a marked card from a 
small deck of cards that has a known number of marked and unmarked 
ones The payoffs are as shown 


Event 

Predicted Not Predicted 
Event £ r ac y 

Occurs £ L y X 

In our terms, the subject must choose between xEy and xEy F. \V. Irwm 
and G Snodgrass (unpublished) have shown that if, m addition to this 
payoff matnx, you reward (or punish) the subject by giving (or taking 
away) a fixed sum of money whenever £ occurs, independent of what was 
predicted, then hvs frequency of predicting the event appears to be a mono- 
tonic increasing function of this “irrelevant” extra payment Although 
this finding is quite reasonably described as proving that an interaction 
exists between preferences and judgments of event likelihood, it does not 
really bear upon the decomposition assumption because the choice is 
between (x + a)Ey, and x£(i/ + tf), where a is the amount paid when E 
occurs 

The main theoretical results that are known about the decomposition 
assumption concern its conjunction with other probabilistic assumptions 
For example, we can easily prove the following result 
Theorem 54 If the decomposition assumption holds and there exist 
x,ye A such that p{x, t/) = 1, then any property satisfied by the binary 
preference probabilities p is also satisfied by the binary judgment proba- 
bilities q 

PROOF Because p(ar, y) =s 1, the decomposition assumption implies that 
for any E, D e^, q(E, D) = p(*£y» xDy), and so any proposition about 
the binary q probabilities can be immediately replaced by an equivalent 
statement about the p probabilities 

In some cases, the transfer of a binary property from the p's to the 
q s can be proved without assuming the existence of x and y such that 
p{x, t/) = 1 For example, we can prove the following result 
Theorem 55 Suppose that the decomposition assumption holds and that 
there exist x.y^A such that y) ^ If the binary preference 
probabilities satisfy strong stochastic transiUiity, then the binary judgment 
probabilities do also 
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PKOOP Let D e ,? be gtven It ts 

stochastic transitivity ™P’‘“ ^ ^ly the decomposition 
has the same sign For any f e o. set 4. 24. PFJ' 

assumption, and collect terms 

pi.Ey, ccFy) - p{=^Dy, xFy) = W. v) - PO/- ^ 

... « k 

The othefresnlts in Luce (1958) are too when 

Suffice It to say that they are that the preference 

added to the decomposition axiom, p ^ related, 

probabilities satisfy a strong utility m ^ j ,^ution functions of the 

but simpler, theorem that Composition assumption 

strong utility model are limited when both the decomp 
and the expected utility hypothesis (unhiy) funciwn u from = 

Theorem 56 Suppose that there exis “ je [0, 1] 

Ax^X A mto a bounded real mterval. whwh we tax ^ ^ 

to! /ruui— ® ^ 

5 the decomposmon assumpt.on ts sausfie 

If A and S are dense m the seme i x(£) - r(£). und 

iyeAandE,D^^ such <*» »W W 0 ,hat 
if FIX) i, then there exist constants e > o 


P(a) = 



e{») = 



,f 0 ^ ^ L 

,f_l ^a<0 

if 0 ^ a 

if-1 ^u<0 
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PROOF Let« flel-l,ll,andchooscar,ve/(an<ii:,fle<!"suchttot 

R = „(x) - life) and P = s(E) - s(D) From the cApccVcd utdity hypolh- 
esis, we see that 

= IrW - u(y)ME) - 40) = - nfeO) 

Thus, by the decomposition and strong utility assumptions, 
p(a/J) = P[ii{xEy) - ti(xDid] 

= p{xEy, xDy) 

= Pi-^y E) 

= i-fe)G(« + 11 - Will - e(»i 1 ^® 

Setting tt s= 1 in Eq 36, 

piP) = W)[2ni) - 11 + 1 - ni) 

Because P(l) ^ 

where fc = 2P(l) - 1 Substituting this in Eq 36 and simplifying yields 

2PMPW) - PW - P(P) + fd) 

W) r 


Define/(a) = [2P(«) - 1]/* Substiluling this and simplifying yields 

(3?) 

Note that/0) = 1 

Because P is continuous and monotonic increasing, so is f. It is well 
known that when a ^ 0 and /(I) = I the only continuous monotonic 
solutions of Eq 37 are /{«) = a^, where € > 0 Because P(— a) = 

1 — P(«), if follows that /(—a) = and so for a < 0, the solution 

IS f{a) = —/(—a) s= — |a|‘ Substituting these back into the expressions 
for P and Q, we obtain the forms stated earlier Because P is monotonic 
increasing and P(0) = fc = 2P(1) — 1 > 0 

Theorem 56 shows that the combination of the strong utility model with 
the expected utility hypothesis and the decomposition assumption is 
really "very restrictive — the forms of the distribution functions are deter- 
mined It IS clear that unlessweintroduce some udfcoc assumptions about 
_ 0 and 1 probabilities, this model does not apply to situations where the 
certain outcomes are simply ordered on a single dimension, for example, 
when they arc sums of money or, more generally, amounts of something 
Completely different, but also very restrictive, conclusions arise when 
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we combine the decomposition assumption with the f 

19 n 3361 even when we do not postulate the expected utility hypotliesis 

l^orsrCtc ,/in, >^eplferenceprota,.U.^ 
axiom that thZudgment probabilities q also satisfy the 
and that the binary p's and q’s satisfy all 

most three equivalence classes , (\9S9, 

The proof of this result is too long to include here, see Luce t 

" A^ough experimental evidence is difficult 
from his experience ."trospections that 

more than three equivalence classes nreference proba- 

that when the assumptions of Theorem 5’ .he type 

bihties over A actually form a expect this to 

discussed in Secs 2 and 3 As .^ 3 , preferences are algebraic 

happen with money outcomes It i j ^ ^ ^ 

when the outcomes are different in , show that preferences are 
bicycle If empirical evidence is pro lj,en the assumptions of 

truly probabilistic among some certain “"“"'’re response W those 
Theorem 57 are too strong to <>f"^'f;f;,‘res reay ultimately be, 

outcomes No matter what oop of a behavioral theory 

Theorem 57 is of interest as an „„ some of the preference 

that leads to strong, being particularly apparent m the 

probabilities without these restrictions being P 

initial assumptions consequences of the hypotheses o 

Very little else is known „f’.ome empirical interest that 

Theorem 57 except for one 7“",“ „ 

IS appropriately described in e 

7 3 Several Theories Applied to a Special Case 

.. .«ni to describe behavior when the 
Both learning and utility , .he problem is 

outcomes are uncertain In ® J response probabilities chan^ 

S^^rto^-lnaCT^" - mmj 

structure to be ex^ learning thcones, then 
asymptotic predic 
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that we can find experiments where both classes of theories predict the 
behavior Comparisons of this sort may help considerably in choosing 
among the various theories 

The special experiment that we shall consider involves two uncertain 
outcomes, both consisting of the same probability distribution over a pair 
of certain outcomes It is most easily represented in tabular form 


Choice 


1 •— TT 


1 2 

Event 
Probability 

Thus, if the subject chooses the uncertain alternative 2, he receives outcome 
a:i 2 with probability tt and outcome with probability 1 — tt 
W e confine our attention to one prediction, namely, the plot of /j = 
Ml, 2) versus tt when the outcomes Xt, are held fixed Experimentally, we 
develop such a plot by carrying out a series of runs at different values of tt 
and estimating p from what appear to be asymptotic data We shall 
examine what ten different theories, six learning and four utility, say about 
the nature of this plot ^ ^ 

* stimulus-sampling model. Suppes (1961) sug- 

‘“"'"S stimulus-sampling model for this experiment under 
o thfsuh,7e, w " r “P'timenter delivered outeome 

condUioneH „ “tat there is a single stimulus element that is 

eondi on ‘’'S'™'"® to one of the two responses The 

's ran meed thaUs ^“bjuut 

thauhe eond , ’ ' °“'uome, then it is assumed 

probTi 1 c response with a fixed 

tranMnon^ '■ ft E -"ut The one-step 

The tte mavt “f ‘ho element are shown in Fig 6 

Trial n + 1 

1 n 


Trial n 

2 _ 


+ (1 — 7r)(l . 

WCj 


■ «i) (1 — 7r)ei 

1 — TT + 7t(1 . 


-J 

on 

= [- -h (1 - n)(l - e.)K + 


the transmon equaUon^staty tetto “uH ‘hen 
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f .K.. nne element stimulus sampling model 

Fig 6 One step transition trees of the o 

» . , 1 - £<P ) =;>» So taking expectations and the 

At asymptote, £(Pn+i) "" 

limit as n -*• coi we find ^ 

”” tt + (1 “ 

It is cleat that if c, = of problShty matching Mote 

This IS the well publicized pre P Xhus the model predicts 

generally, we have 5 mfun^o vprshootsrr for all values of t, matches 

that the response probability either o j variants of this model are 

It. or undershoots it for all Chapter 11) 

also considered in Suppes and theoretical approach that also 

OBSERVING-RESPONSE MODEL ] „ theory but that adds a new 

originates in the tradition of stimulus samp ng ^ 

concept IS the one that P°^'“'"rp,',‘Tt response has been referred 
prior to the observed response ^ f, avoidance response in the 

to as an observing, m Audley (I960). Bower (1959), 

literature The basic P" '^oTthe Estes model is 

and Estes (1960) AsimpMedvem^ on (1961). 

tested in Suppes “"^^Xed by the following five assumptions, furth 

r^reXrtXca^eoftwor.^^^^^^^^^ 

, on every trial each -spon-h“-;',PP.,,^ , tandomly 

2 At the start of each 

‘’'’rTflhe approach .t“ mer r'c^Ce is obseoed 

made If the approacn 
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4 If both responses have approach value 0, a randomly selected re- 
sponse IS made after both responses hate been observed 

5 With probability n a rewarded response obtains the approach value 
1 when It had the value 0, and svith probability I an unrewarded response 
obtains the approach value 0 when it hid the value 1 

To apply these assumptions to the outcome matrix given at the beginning 
of tins section, we impose the further restrictions that and 

From these restrictions and the five assumptions we can derive 
a 3 state Markov chain, which we do not produce here The mean 
asymptotic result obtained from this chain is that 

— + 7t(1 ~ 

+ (1 — Jt)* + 27t(1 — 

where <{> s= //u, that is, the ratio of the two approach parameters The 
probability is a monotonically decreasing function of ^ and it is 
bounded by the closed interval with end points ^ and 




STRONG AND WEAK CONDITIONING MODEL A Still more promis- 
ing modification of stimulus-sampling theory that is designed to account 
for the choice of uncertain outcomes with varying payoff was worked out 
by Atkinson (1962) and Myers and Atkinson (1964) The central idea is to 
assume not only that each stimulus element is conditioned to a response 
but that it IS either strongly ot weakly conditioned We do not state the 
assumptions of the model, but they follow the intuitively obvious course 
For example, if an element is weakly conditioned to a rewarded response, 
then With probability ft it becomes strongly conditioned On the other 
hand, if the response is not rewarded, then with probability 6 it becomes 
weakly conditioned to the other response Similarly, if the element is 
strongly conditioned to an unrewarded response, then again with probabil- 
ity d It becomes weakly conditioned to the same response 

It may be shown then that the asymptotic probability of choosing 
response 1 is 


p, = — zil±jAL=irli / 40 ) 

+ (1 - -nY + ,(i _ 

where = 5/^ Note that is again a monotonically decreasing function 

of , and It IS bounded by the dosed interval with endpoints rr and 


+ U--ny 
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experimenter-controlled linear learning model Bush 
and Mosteller (1955) and others have explored a learning model in which 
the response probabilities are assumed to be linearly transformed from 
trial to trial, the constants of the transformation depending on what 
happened on the trial Little js known about the asymptotic properties 
of this model when the linear operators depend on the subject’s response, 
so we are forced to assume that they only depend on the outcome received 
— the so-called experimenter-controlled case 
If we suppose that is preferred to and Xga to then it is reasonable 

that whenever the event that has probability tt occurs the subject should 
increase his tendency to choose response 1 and that whenever the comple- 
mentary event occurs he should decrease it This leads to the model 


Pn+i 


[(1 ~ ^i)Pn + with probability ■n 
1(1 ^ 2 )Pn* probability 1 — tt 

Calculating expected values, 

^(PnA-l) - ^[(1 - Wpn) + + (1 - 7 t )(1 

Therefore at asymptote we have 


• B^E(Pn) 


„ * zr (41) 

(1 — n)6zldi 

Note that by substituting ej = and ci — Oj, this model predicts the 
same asymptotic mean probability as the one-element model (Eq 38), 
even though that model required the assumption that ~ and this 
one does not 

SUBJECT-EXPERIMENTER-CONTROLLED BETA MODEL The beta 

model IS of the same general character as the linear model except that the 
stochastic operators have a simple nonlinear form The mam idea is that 
It IS but the quanfty u, ^ pJV - pj that .s 'tansformed m a 

simple way (see Luce, 1959, aud Sec 2 5 , Chapter 9 of Vol II) Specif- 
ically, we assume that 


with probability 


where B is the parameter correspouding to oulcome , and response; 
T^tprithms, substituting b„ = log ft,, aud caleulaliug espectalious, 

£(log r.r.) - - £ J. 
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K = - i,j) + (1 — irXAji - fcjj) 

and 

It has been shown that if X 0 and if the limit of the expected value of 
v„ IS not zero or infinity, then the asymptotie expected mean is given by 
Eq 42 (Luce, 1959, p 116) The condition /f ;*£ 0 is seen to be violated 
when the events are experimenter-controlled, that is, when and 

hi = bn The general situation is rather complicated For example, if 
■^bn + (1 - rr)6„ < 0 and rri,, -K1 - > 0, then p„ = 1 

There is some formal resemblance between the asymptotic mean of the 
beta model, Eq 42, and of the experimenter-controlled linear model, Eq 
41 (and so of the one element stimulus sampling model, Eq 38) The 
difference IS that the beta model includes a 1 - „ term in the numerator, 

Two nrcH ! thCSC 

experimenter controlled events and so it is less general ^ 

threroldroHpi J", 2 3 and 5 2 of Chapter 3, Vol I, a 

two resnonsps Th * ®®°'**'* fof experiments with two presentations and 

subieet Lter, of 'ha presentation the 

ubject enters into one of two detection” states, the probability depending 

by ie s r hL'’thrs'“h‘‘ assumed to L gwerned 

In the tetm “ response bias that he controls, 

that IS there are n ^ presentations are the same on all trials, 

rling to haunen '“™“" '» -nd'^ete to the subject whal 

behave as if they^think that ther”*’ '* 0°"'^'^'''“'=!= that some subjects 

and an upper hmb bias yields 

bet:cr™ d^* a itnmfmoi “ <>’ -P-enee, and so they can 

5 2 of Chapter 3, we find thm -^“uming the linear process of Sec 


rr-Kl-rrjej/e,' 
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Where 0. and 0. are learning rate parameters If we “-ume that there is a 
value rr such that the subject uses a lower limb bias for tr < and 

upper hmbtasolherwise.^ 

( ^ 

I TT + (I “ 7T)02/fli 


P® = 


if TT < TTo 
if TT > ^To 


( 43 ) 


I ^1—0) - 

I ^ TT + (1 — ir)02/Oi 

Note that when the learning rates slope ? 'tte'upper 

are linear m r. the lower limb 

hmb ends at (1, 1) and has slope ?, ^ j (ed as a random 

them at ,r = Ro Of course, .1 is likely that u. must be mea^^^ ^ population 
variable even for a single subject, an “ J” ^ a discontinuous 

of subjects Thus in difference is sketched in 

function, but rather an average of se 
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1 - TT between Eqs 42 and 44, %e find lhat = 6-2 < 0, which is 
impossibk m the strict ntility model 

Note that if we demand = 0 when w — 0 and — I when tt = 1, 
then Ujj = t2i = 0 and Eq 44 becomes 


71- + (1 — 7r)n.i/t’ii * 

which IS exactly the same form as Eq 41 of the cxpcnmcntcr-controUcd 
linear model If in and ^ 0, it is easy to sec that is defined for all 
TT, 0 ^ IT ^ 1 , and that for rr *= 0, Pco > 0 and for tt = 1 , Poa < 1 . 
DECOMPOSITION AND CHOICE MODEL LuCC (1959, pp 86-88) 
showed that if the assumptions of Theorem 57 of the previous Section arc 
satisfied (namely, the decomposition assumption and the choice axiom for 
both the /j’sand ^’s), if for x, y e>4,p(x, y) *= 0, or 1, and if in particular 


pi^iu ^ji) — p(^K» ^k) — ^ • 


then p must be a monotonic increasing step function of -r We do not 
reproduce the proof here It should be noted that no results are known 
about the location or breadth of the steps, the theorem merely says that 
the function must be a step function 

UTILITY OF VARIABILITY MODEL Siegel (1959, 1961) suggested an 
asymptotic utility model that, m addition to considering the utilities of the 
outcomes, incorporates a term to represent the value to the subject of 
sheer variation in his responses Siegel wasted to consider this modification 
of the usual algebraic expected utility models because of their clear experi- 
mental inadequacy He first worked it out for the two-response case, 
such as the probability prediction experiments in which the subject 
attempts to predict which of two mutually exclusive events will occur, and 
later Siegel and McMichael (I960) generalized it to n responses 

Siegel supposed that utility arises from three sources is contributed 
to the total whenever the event that has probability tt of occurring is 
correctly predicted, U2j is contributed whenever the complementary event 
IS correctly predicted, and « is a contribution due entirely to variability 
m the subject’s responses, where it is assumed that variability is measured 
by j5(l - p), p being the probability of response 1 Thus the expected 
utility IS 

Eiu) = UiiTTp + Wjjfl — 7r)(l — ^) q- 


To find the probability p„ that maximizes expected utility, we simply 
calculate the derivative of £(«) with respect to p and set it equal to 0 
This yields ^ 

_ Aoi + q«) -Ki - g,) 


2 




( 45 ) 
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wnere „ 

fli = ” and 02 — 

It .s not difficult to see that Eq 45 is a special case of Eq 42 that aro^e 

from the beta model Specifieally.ifwechoose the 6 s so that A„ 

feai - *22. then Eq 42 reduces to 

— *41^) 4- *22 

Pas U h ’ 

[>21 ~~ ^22 

which IS the same as Eq 45 with 

OA 1 2*22 

Therefore any dam supporting Siegel's ^ 

well he interpreted as supporting the rule Indis- 

KBL.T.VE HX-CXEO LOSS 

cussing his data (see Sec 8 3), to _ nnH «• He began 

somewhat ad hoc proposal ^ fo^ uncertain situations 

with Savage’s idea (see Sec 3 4) tha ^ tl^e 

should be based not upon the payo P ^ subject 

differences between what the received This is 

known which event would occur a Savage’s notion of a regret 

mi“f3 4T Msummg numencal payoffs, then the loss matrix in 
our simple 2 by 2 situation is ^ 


,7 r 0 

1 — 7r[x22 — ^21 


The expected loss for each response is therefore 

£i, = nO + (1 - ■'rX='« 

EL, = rHiXii — + (I ~ 

The relative expected loss is ^ - x,,) 

(EL, + EL,M2 ^ probability is a linear function of 

Edwards’ RELM ,hom is a constant A' such that 

the relative expected lo . _ 

T = 1 + ^Iel. + elJ 
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Fig 8 Typical predictions of ^oo versus jr from ten dificrcnt models. 
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PTfd.:r "srchii- 

then the predictions are identical 

3(1.22 + = '’!> + 

b.i - bi2 = 

il 2 , - (>22 = *22 - '^ 2 * 

_ ^ — = K + i 

stiMMAitv Possibly the ^^rnor'^h.^^hlTe 

predict IS to plot specific examples of q „,p,e of 

Fig 8 There are only seven P'“‘^ “he final seven are really 

models do not lead to different ‘ ^ grimentally among them, 

quite different, and so we can hope to decide exp 
The existing data ate described m Sec 

vi-TAT tfsts of probabilistic 
8 experimental tests 

MODELS 


8 1 Some Experimental Issues 

The experiments intended to >s^ot 

are neither numerous. -riments to perform or how ^ 

certain that we know yet just „rta,n that we do not know 
to implement them, and h j,„p,,ry somewhat these sum 

XratirbefL camming .P, .3 intended to a 

behavior IS “asyr^tol'c J to stabilize human response 

of the order of 500 trials are nerte ^ loTc mns 

his experimental o°"‘';''°"'byrf ehc,ee situations, equallj lone 

proba^hties.ndifiemn^b^^^_^^^_^jp^3^„, 

must be carried ou " -.his approach has 

Aside from '"'P”'"“' f,n-crem presenlalion sc»-'h'r pi 
probabilities ^.mi.s senous areue that 
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many presentation sets should be interleaved in some more or less random 
fashion, only later to be regrouped when estimating the response proba- 
bilities Some have gone so far as to say that each presentation set should 
occur only once, but this makes it most difficult to test adequately any 
probabilistic theory single observations do not lead to refined estimates 
of nonzero and one probabilities 

The primary objection to interleaving a number of different presenta- 
tions IS that there may exist appreciable, but not easily observed, sequential 
effects among the responses if the general view embodied in the various 
mathematical theories of learning is approximately correct, we can hardly 
hope fox asymptotic behavior when we continually alter the choice 
situation We know of no demonstration of this a priori fear nor of any 
answer to it except the fond counterhope that people actually are better 
able to uncouple their responses to different presentations than current 
theories of learning lead us to believe 
Another difference of opinion exists concerning how much a subject 
should know about the chance events that are used to generate the 
uncertain outcomes In experiments arising from the learning tradition, 
especially the probability prediction experiments, the subject is told little 
or nothing about the mechanism used to generate the conditional outcome 
schedules, indeed, his task is to discover the probability structure over 
the outcomes It is in this context that hundreds of trials are needed before 
the behavior seems to stabilize If, however, one is interested only m 
asymptotic behavior rather than in the transients of learning, it hardly 
seems efhciem to force the subject to discover by his own crude, empirical 
means much of what can be conveyed accurately to him in words So, 
other experimenters have gone to the other extreme of telling subjects 
everything they can about the generation of the outcome schedule except 
what will actually occur trial by tnal In some experiments the mechanism 
for generating the events, for example, dice, is part of the visible apparatus 
and the subject actually sees each chance event run off, m addition, he is 
sometimes given complete information about the probabilities of the 
several relevant events 

In general, experimenters either have coupled long experimental runs 
of a single presentation set with no a prion information about the condi- 
tional outcome schedule or they have coupled interleaved presentation 
sets with maximal a prion information about the generation of the 
schedule The other two combmauons do not seem to have been much 
used (an exception is Suppes and Atkinson, 1960), even though there are 
good reasonsforlryingasingle presentation set with complete information 
asymptotic behavior might occur quickly and the problem of interaction 
between different presentation sets would not exist 
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in add.t.on to thes= two broad tssues of «Penme„tal “ 

minor differences exist in the Thus existmg preference 

agreement about how they ^ht ^ 

experiments are far less compar animal learning 

experiments in psychophysics or in pa j n is completed 

However we may decide to run be^ to be formulated, 

we face nasty statistical are willing to assume that there is no 

let alone solved Suppose th esUmItes of the response proba- 

variability other than binomial .^liYpothesis of strong stochastic 

bihties and that we wish to test wheth yP then 

transitivity is met, * j""cle« tot when we simply substitute 

P(t, z) ^ max [p(T, y\P^’f>} similar condition, we are going to 

estimates p for p in this or any v) > i when, in fact, 

make three kinds of errors it may ^ hypotheses are not 

p(T, .) < i causing us m fact, p(t, r) J 

really satisfied, it may be tha jt js true, and it may be 

p{x, y), causing us to reject the co causing us to accept 

that p(x, a) S: iie’usutl problem of trying to avoid 

It when It IS false Thus, we h jcnown about testing 

both types of errors Nothing ,mniausible assumptions (see Sec 

this hypothesis except under rat P 1 ,^ 1 ,, ly simpler cases when 

8 2), Hi no work PXli=rdependenee'^of p(*. »P“ 

a theory prescribes an expnci p 

pfe,z), such as the product 

p(ie, 

which derives from the ^‘nct utility model rt 

Lacking satisfactory ‘"la, mns of strong transitivity and on 

for example, the percentage of viol failures are sumc^Py 

some intuitive basis, they '2' iXter situations, tables or pbts 

one IS ,e. 

fcrra^s-tot have a^any rom^tooX" 

.he experiments a«ordinB .oJh^ '".'h: " cn.ers hase 

" '"vS are the only binary .0 pt a. the plot of 

vcrsusn in ai"o- n 
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three studies that are concerned with probabilistic expected utility 
models As far as we are aware, except for two minor results that are 
mentioned in footnotes, these are the only theoretical ideas that have 
been studied empirically 


8 2 Stochastic Transitivity 

Of the various ideas about probabihsttc choices, the most thoroughly 
examined are the three stochastic transitivity properties The experimental 
results are none too consistent, except that weak transitivity seems to 
have stood up whenever it was examined, in several papers nothing was 
said about it, but we suspect that this means that it was satisfied Edwards 
(I96lb, p 483) makes this comment 

No experiment yet reported has created conditions deliberately designed to be 
unfavorable to transitivity, strong or weak, and ended up accepting even weak 
stochastic transitivity^® In short, as a basis for psychological theorizing, 
algebraic transitivity is dead, and stochastic transitivity, strong or weak, has 
yet to be exposed to the adverse climate of hostile experiments It seems likely 
that conditions can be designed m which subjeas choose intransitively most of 
the tune (unpublished research so indicates), it is even possible that the direction 
of the intransitive cycles can be controlled by experimental manipulation 

We group the following experiments into three categories according 
to the number of observations that were used to estimate the response 
probabilities 

ONE OBSERVATION Davidson and Marschak (1959) performed the 
first study m which each choice was made just once A typical presenta- 
tion of the experiment was 

A B 

ZOjr -54 +36*^1 
ZEJ l~2l^ -38-tJ, 

where each nonsense syllable refers to the labels on three faces of a die 
Three dice with different pairs of syllables were used to generate the chance 
events A total of 319 different choices were presented to each subject, 
of which 15 served as prelrainmg and to test whether or not the events 
had subjective probability of \ (a point that does not concern us here) 
The payoffs were actually carried out in 107 randomly selected cases 

“The wording of this sentence suggests that experiments have been performed that 
prov^e grounds for rejecting weak transitivity, we are not aware of which they are 
and Edwards does not give any references 
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Each of the 17 student subjects, 11 men ® ”^^001 one 

ually m three sessions of 35 to 55 minutes each that were spac 

to five days apart , . „ of [he transitivity 

There were 76 triples of f ,*mvny of “intervals” was 

conditions could be checked ( n a (1 any triple of presenta- 

exammed,butwewillnotgo mto that ) Conside^^ 

tions (T,!/), and ^ algebraic transitivity and two do 

possible observations, six ^ pfnj from a number of triples to 

not The problem is to use ‘he obse underlying 

infer whether or not it is rea another of the 

probability vectors (p(x, y), piv^ =). A • > ^ooh vector must he 

stochastic transitivity conditions ,atisfvine weak transitivity he m 

in the unit cube U, that those vectors ^ ^0 transitivity he in 

some subset fV of U, and that those ^ ® ^ ^e consider three 
another subset S. where, of course, S c: W 
hypotheses ri 


Ho 

H„ 

H. 


Davidson and Marschak f hyptth^^^^^^^^ the numbers 

observation on the assumption 


are the following 

Hypothesis 
Ho 

H, 


Probability of an 

Intransitive Observation 

0 2500 
0 1875 
0 1375 


The decision rule that tlmy ^o mrerrorpm^ 

,e hvnothesis under test, ^'^“^furexample.H.^ accepted 


ItrlCe^othesisH, Thus.tor«^^^^^^^^^ 

r of intransitive cycles ^ > c I /f 

Pr{r<c\Hc IS true) = r I ^V„„ u sample s.K 

and this common P-babildy - *-8^" is 0 24. and one 

I..(r. n) = ' 
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where P^{r, n) is the probability that, when is true, exactly r intransitive 
observations will occur if n observations are made, and Po(^ ”) 
thing when Ho is true A similar quantity can be calculated for hypothesis 
H, 

The data and the two likelihood ratios for the 17 subjects are shown m 
Table 4 It is seen that H„ is acceptable according to the decision criterion 
for all subjects and that H, is rejected for two, J and N 

Table 4 Number of Intransitive Observations in 76 Triples 
and the Likelihood Ratios for and H, against Hq as 
Reported by Davidson and Marschak (1959) 

Likelihood Ratios 


Subject 

Mumoer oi 

Intransitive 

Observations 

H«, against Hq 

against Hq 

A 

4 

100 

2,100 

B 

10 

11 

26 

C 

11 

76 

12 

D 

n 

76 

12 

E 

1 

300 

20,000 

F 

9 

16 

54 

G 

5 

69 

1,000 

H 

4 

100 

2,100 

I 

4 

100 

2,100 

J 

16 

1 2 

03 

K 

8 

23 

no 

L 

2 

210 

9,300 

M 

5 

69 

1,000 

N 

16 

1 2 

03 

O 

7 

33 

240 

P 

7 

33 

240 

Q 

14 

25 

1 3 


It IS noteworthy that the number of intransitive observations dropped 
over the three sessions the group percentages were 13 9, 10 9, and 6 8, 
with an over all average of 10 4% 

Davidson and Marschak concluded that the experiment provides no 
evidence for rejecting weak transitivity and that m only two cases is strong 
transitivity m serious doubt The difficulty with this conclusion, as they 
recognized, is that the experimental design was such that these tests are 
not very powerful As Morrison (1963) has pointed out, if each pair of /i 
stimuli IS presented, the maximum proportion of intransitive triples is 
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a,, -- W .. »™S 

IS trying very hard to be intranativ . u or H„ whether or 

have little hope of rejecting either nu Moreover, the 

not weak and strong stochastic ^ f distributions over the 

formulation of the ''5'?°^’'®“®'" “™* questionable, however, without a 
relevant regions of the unit cube 1 , _„t clear what distribution we 

far more complete theory of behavior, it is not clear what 

should assume 
Table 5 Scale Values 

Programming Models to th Ten^Categones of Newspaper 

Women Expressing Preference for len taar g 



Government 

News 

1000 

242 

1000 

276 

Medical and 
Health News 

758 

43 

724 

69 

Feature 

Columns 

715 

22 

655 

69 

Editorial 

Comment 

693 

121 

586 

69 

Economic 

News 

572 

20 

517 

69 

Science 

News 

552 

241 

448 

138 

Crime and 

311 

310 


310 

69 

Accidents 

Sports 

1 

135 

241 

103 

Personal and 

175 

175 

I3S 

13^ 

Social News 

0 
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For further discussion of the statistical problems when only one 
observation is made per presentation, see Block and Marshak (1960) and 
Morrison (1963) 

A large*scale study of newspaper content preference was made by 
Sanders (1961) who utilized the Thurstone pair comparison model and 
the linear programming model of Davidson, Suppes, and Siegel (1957) 
A sample of 282 men and women were asked to state their preferences for 
ten categories of newspaper content presented in pair comparisons 
The results of applying the two models to the data are shown in Table 5 


Table 6 Number of Circular Triads Observed m 45 Pair Com- 
parison Choices by 282 Men and Women (Sanders, 1961, p 23) 

Number of Circular Number of Number of Circular Number of 


Subjects Triads Subjects 



lodd programming 

formed to a rn mom ”nh the scale values trans- 

models yield very s.mda’r presence ^alef" ''™"’ ‘™ 

strong stochSKtr'Ltmvity h^d^rcounUh 

observed m the 45 nair%. number of circular triads 

resnlts are shown m ^^ble 6 ^hese 

random basis the exnect«»H u made on a completely 

shows, almos’t all S 282 subTe^: Table 6 

Circular triads In fact the m produced a far smaller number of 
triads not more than three such 


A FEW OBSERVATIONS 
Papandreou ct a! (1957) 


Two studies fall into this small sample range 
prepared alternatives of the following type 



models 
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A triple c = (c„ c„ C3) consists of imaginary alternatives 
Cj = 3 tickets to X and 1 to y, 

Cj = 2 tickets to X and 2 to y, 
cj = 1 ticket to X and 3 to y, 

" 1 at S3 60 and x and y were chosen from ten 
where eaeh ticket was valued ^ . opera, etc , and 

activities, five of which were “ ' ‘ game, a tennis match, etc 

five of which were “-‘“etic” such as a baseball ga^ 

These outcomes were ^gs offered six times, so that a 

the three binary choices from each p orders of 

total of 6 X 3 X 45 = 810 choice ^ subjects 

things were randomized in 1*'^“^'* over a period of 2i weeks 

participated, their responses being ,„g„t was criticized on the 

After these data were ’ J^r stro^ngly one kind of activity to the 

grounds that if a person were ^ ^ ® ^ one and so the numbers 

other, then he would only ^ „ng and would automatically 

of tickets of that activity ob^l.on led the authors to 

ensure transitivity This quite ,„ples in which each c, 

replicate the study with five J . , „ but no commodity appear 

was constructed from two basic eommodit e . t sm 

in more than one c, of ‘"f were constructed, .0 wh h 

commodities Twenty ‘"P'' ° ,„g a set of 32 that was E^ich 

-■ 

ra"hoTo^h7nrirhypoThes.s.hm^^^^^^^ 

) arc given m Taoic / 

further because the I tr, Picnts m 

“ernl'ntallsample.^^^^^^^^ 
esents, the only ones 
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of stems and heads, in the other, they knew only the composition of a 
nseed sample of 10 that they saw. Three pairs of payoffs were associated 
with the events (25+, -25+), {25+, 0), and (0, -25+) Because a pair of 
payoffs could be assigned to heads and stems in either order, the same 
choice was generated in two ways "With three pairs of payoffs, this means 
that there were six uncertain outcomes associated with each pair of 
events Assuming that the decomposition assumption (Sec 7 2) is correct, 
as Chipman did, this yields six independent observations from which to 
estimate the probabilities 

Table 7 The Frequency Distributions of 
Likelihood Ratios Reported by Papandreou 
etal (1957] 

Study A s* 1 X<\ Unclassifiable 


1 104 101 5 

2 89 13 2 


A total of ten male students participated in the study. 

There ate only two triples of uncertain outcomes m this study for 
which stochastic transitivity can be tested “ For one triple, all subjects 
satisfied weak and moderate transitivity and four appeared not to satisfy 
strong transitivity For the other triple, weak and so moderate and strong 
transitivity were violated by one subject and strong by five others 
Chipman did not express an opinion about how these results should be 
interpreted, but he cautioned the reader about the smallness of the sample 
size To get some notion of what we might expect with small samples, we 
carried out Monte Carlo runs for three sets of three probabilities satisfying 
strong stochastic transitivity {m fact, satisfying Eq 46, p 379 that derives 
from the strict uiihly model), the percentages of violations of the several 
transitivity properties ate shown in Table 8 

A MODERATE NUMBER OF OBSERVATIONS Again, tWO StudlCS fall 
into this category In the first, Coombs (1958, 1959) obtained preference 
reports on shades of gray from four subjects, two men and two women 


Chtprtun also looked at several other things, among them a single direct test of the 
equation 




that arises from the strict utility model and from the choice axiom when the several 
proba ilities arc different from 0 and 1 The test seems inappropriate however, because 
one of the probabilities was estimated to be 0 for all ten subjects, in which case the 
equation is not necessarily expccud to hold U does not 
» We wish to thank Richard Willens for caning out these calculations 
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The Stimuli were 12 gray chips. « a"^uS^ 
black. The presentation sets f^i'^Xences-preferen^^ 

were instructed to rank them accor ^ng P^^ different presentation 

for what purpose was not specified. E occurred in 45 of the sets, 

sets was presented twice. Each pair o hipher than the other in the 

and the number of times that one was ranked h gher 

90 opportunities for comparison wp P . jo^olves a 

bina^ choice probability. As we pointed out in Sec. 

M «f Moderate, and Strong 

Table 8 Percentage of Failures of Probabilities from 

Stochastic Transitivity ^““^Probabilities that Satisfy Strong 
Monte Carlo Runs Subject to Irue rr 

Stochastic Transitivity. oiie (Eq 46, p. 379) 

E„i each of .he three eels of S of 5%, -4 " 

estimated prohahilities were calculate calculated from, respectively, 

A,?SS;e?Once CoumedThr«Th^ 


p(x, y) =s 0 60 

p{y, *) = 0 90 
p{x, t) = 0 93 


19 ^ ^ 

P- -t'-TtiP a 

convincing data ab j,c,i„r weak ^ h subject in such a 

The motivation for <hc «^ Coombs O^S) %_ For a 

random utility "°f''„'rofdi„g technique (see end of Sec. 

generalization of h.s 
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Fig 9 Distributions of stimuli and an ideal point in Coombs’ random utility model 

formal statement of the model, sec Coombs, Greenberg, and Zinnes 
(1961) In words, the mam idea is that random utilities can be assigned 
not only to the stimuli but also to the subject, representing his ideal 
outcome, and that preferences are determined by the absolute values of 
the deviations from the subject’s .deal Specifically, if U is the random 
vector over stimuli and I the random variable associated with the subject, 
then -|U(a:) - I| is the random utility assigned to stimulus x in the usual 
random utility model (Def 20, p 338) and so 

PrW = PrdUW - I| ^ |U(y) - II, s, e y] 

ranHo “f argument, that each of the several 

r^odrdJfT^i" " <i'slnbuted according to some uni- 

Now cofs ? f Thurstoman models 

SiT’e pairs of sttmuh that are generated 
nf th T ^ <*'«ributions that he on the Mme side 

of the distribution of the .deal point, as A and B do in Fig sTen the 

Co'rmbfea'’h of the Ideal is irrelevant in the comparison® Such a pair 

S ParticZ/vlr i'" “ “"1 C do, Ln 

difference in the random variable makes a great deal of 

iZrT Such a patr he calls 

tran“;rff we'ihlrV,r“'^^ *' =‘°=l-ast.c 

mean S pomTas ” Fm m r*'"™"* """S ‘"o 

All three offte stimuli mfv he’ ™ T" '*'**‘"8“'"'' three possibilities 

,„;r .s: rc,s 



"“•"fvo™ when u.e ,e.le „ folded ,bo„. ,h. „o™„u,y 
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E falls between A and C, then the ‘"P]® two (D, B, C) 

when the isolated one either is nearer 

or IS further than the other two from the ideal {A, B. r), 

IS called bilateral adjacent „„„„„ the three distributions of a 

Note that the regions of o^rlap « the^thre ^ 

unilateral triple are Jy Thurstone or weaker assumptions 

variable Thus, making the nrediction of strong stochastic 

(see Theorem 44, p 348), the S^f^pP ^i ^ral triples, variab.hty 

transitivity holds for unilateral triples For bilateral p 

Table 9 Number of Violations of Strong Stochastic Transitivity 
Reported by Coombs (1958) 


Bilateral Spht 

Subject Cases Violations 


Unilateral Bilateral Adjacent 

Unilateral Violations 

Cases Vioations Cases 


V, htv of the distributions when 

transitivity m be saUstied each rorfer ^ of 'he 

Totestthishypo^P^^^^ 

types Coombs did J asked , 1^ predicted the 

stimuli was by brigh n . . j ,0 that the fo possible 

could be chosen for each , °L ,nd vmLtmns strong 

observed preferenc ^ach triple was classifi 9. 

Once this was done, ooonled The re one cannot 

r— - o 

Perhaps the most imp 



39 ° PREFFRFNCE, UTILITY, AND SUnjLCTlNl PROBAniLlTY 

model are a characterization of those preference situations for which the 
concept of an ideal is appropriate (it seems to be for this experiment, but 
it IS much more questionable for money outcomes) and derivations of 
more of its mathematical properties An interesting theoretical discussion 
of Coombs’ experiment from a somewhat different viewpoint is to be 
found in Restle (1961, Chapter 4) 

The second study having a moderate number of observations per 
presentation set was reported by Griswold and Luce (1962) Five subjects 
made choices among 74 different binary uncertain alternatives that were 
each presented, depending on the subject, from 32 to 50 times m random 
order In 34 of the presentations the outcomes were sums of money 
ranging from 1 cent to 50 cents, and in the remainder they were packages 
of cigarettes of different brands The chance events were generated by a 
simple pinball machine, and the events were run off after each choice were 
made No payoffs were made during the course of a session, but at the 
end of each session one of the money and one of the cigarette pairs were 
chosen at random and the subject was paid off according to the outcome 
that had previously been determined 
With the cigarette outcomes, it was found that strong stochastic 
transitivity appeared to be violated in 18 out of 67 possible cases 
Arguments based upon apparent shifts in preference among the cigarette 

Snrir 1 violations may not be as 

nfavorable as it seems For money outcomes, only cases of pure 

reported, there was only one violation of transitivity m 56 cases which 

not satisfv the sa” * among well-ordered outcomes may 

not satisfy the same model as those among more complex outcomes » 


8 3 Plots ofp„ versus w 

the depeLenS ofThe asvmDtohc''ch"* studies have been concerned with 
probability tt The vast maiontv ofth'* •"'“’’“’'‘'dy on the outcome 

ralnirv^c 

I:™™ 
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considerably m their details, and a number X" 

of interest W learning theorists which, however 

concern here It does not seem appropnate to try to describe 

detail, especially since most of these details seem to o" ° 

effect; on the plot of p„ versus ir. so we simply outline the mam 

that ate common to all designs „„ which of two events, for 

The subject is repeatedly required to P^^mt *“01. 

example, which of two lights, will schedule^’ with probabilities 

programmed according to a simple ra subjects at the 

ir and 1 - ir, these probabilities are one of the 

beginning of the experiment I" jwo conditional 

two events occurred on a given tri , experimenter 

outcome schedules were indepen en would have happened if he 

had the option of informing the su ^c 

had made the other response /l^rtoward rnger runs It is 

run, with the more recent studies g asymptotic 

generally conceded today that the 

before two or three hundred trials, P averages for groups of 

The data reported in the ^^ur iTd theTarg«t well over 100 

subjects, the smallest group size bei g , indicates that these 

Considerable evidence (not much of P““ of estimated p. 

group results can be quite "‘^leading sometimes there is little 

over subjects often seems not ^ uailcv of the distribution coinciding 

doubt bm that It is bimodal, with the v^ey 01 

roughly with the group mean pro asymptotic results t a 

In 1956, Edwards presented a estimate of the asymptote 

had been reported up to 1955 ,t) . ” If one compares h.s 

(subjective, but as unbiased as I when the data curves had 

estimates with the original ' ^ Edwards extrapolated reason ^ 

not leveled offtotheirasymptoti I asymptotes As ‘h> 

smooth curves to determine h-* Y'Tre'pmt the observed values averaged 

ca;eT"i;c cu"es"re •’£‘:^"bTe‘’JoTot 

estimates are given i exchcit payoff matrix w cents 

““'’'■“"TTJ.--"— 

» Schedules on« ^ 

to the simple rando 
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Table 10 “Asymptotic ’ Group Mean Probabilities from Tuo- 
Response Probability Prediction Expenments avilh Simple Random 
Schedules and No Payoff Matrices 
(The number of trials shown is the total run for each condition ) 


Experimenter 

Size of 
Group 

No of Est 

Trials ir pg, 

Comments 

Grant Hake & 
Hornseth (1951) 

37 

60 0 00 0 00 

0 25 0 24* 
OSO 0 55 

0 75 0 78* 

1 00 I 00 


Jarvik (1951) 

29 

21 

87 0 60 0 57 

0 67 0 67 



28 

0 75 0 76 


Hake & Hyman 
(1953) 

10 

240 0 50 0 50 

0 75 0 77 


Burke Estes & 
Hellyer (1954) 

72 

120 0 90 0 6? 


Estes & Straughan 
(1954) 

16 

240 0 30 0 28* 

120 0 50 0 48 
120 085 067* 


Neimark 

20 

100 0 66 0 62 

When outcome of 

(1956) 


66 1 00 1 00 

other choice is 
unknown 0 60 and 

0 99 

Gardner (1957) 

24 

450 0 60 0 62 

0 70 0 72 


Engler (1958) 

20 

120 0 25 0 29* 

0 75 0 71* 


Cotton & 

RechlscbafTen 

(1958) 

24 

450 0 60 0 64 

0 70 0 74 



Neimark & 
Shuford (1959) 


36x3iuns 100 0 67 0 63 
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Table 10 {continued) 


Experimenter 


Rubinstein (1959) 


Size of 
Group 


44 

41 

37 


No of 
Tnals 


Est 


Comments 


126 0 67 0 65 The three conditions 

0 67 0 69* differed in the 

0 67 0 78 representation of 

the event 


Anderson & 
Whalen (1960) 


300 0 50 0 52 

0 65 0 67 
0 80 0 82 


Suppes & 
Atkinson (1960) 


30 


240 0 60 0 59 


Edwards (1961a) 


1000 


0 30 on 
0 40 0 31 
0 50 0 40 
0 60 0 69 
0 70 0 83 


Myers et al (1963) 


400 0 60 0 62 
0 70 0 75 
0 80 0 87 


Friedman et al (1964) 80 


288 


0 80 0 81 The third of three 


experi! 


mental 


• Definitely appears not to be asymptotic 
himself the general comparisons b'Wee" ‘hese numerous exp 

ments without payoff matrices ap^ however, payoffs are 

hypothesis, at least for Stoup probability matching is not con 

included. It IS clear from Table n tha‘ Prob J j 

;“r ri 

curse IS „ ^mewhefc m the neighborhood 

matching prcdicxi 
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Table 11 “Asymptotic” Group Mean Probabilities from Two- 
Response Probability Prediction Experiments with Simple Random 
Schedules and Payoff Matrices 

Except where noted, the payoffs are m cents (The number of trials shown is the total run for 
each condition) 


Size of No of Est. 

Experimenter Group Trials Payoff Matrix rr Comments 


Goodnow (1955) 

10 

14 

14 

120 

(-1: -1) 

0 50 

0 70 

0 90 

0 52 

0 82* 

0 96 

Edwards (1956) 

24 

150 


0 30 

0 18* 




\-5. lo) 

0 50 

0 48 





0 60 

0 62 





0 80 

0.96* 


Edwards (1956) 6 150 / 4. — 0 50 0 59 When outcome of 

1—2. 4) 0 70 0 85 otherchoice IS un- 

0 80 0 98 known. 0 59, 

0 86. 0 94 

Edwards (1956) 6 150 / 4. — 2\ 0 50 0 30 When outcome of 

\^*-2, 12^ 0 70 0 46* other choice IS 

0 80 0 80* unknown, 0 42, 

0 90 0 95 * 0 63,0 84, 0 93*. 


Galanter & 
Smith (1958) 


Siegel & 
Goldstein (1959) 


Suppes &. 
Atkinson (I960) 


Siegel A Abclson 20 
(m Siegel. 1961) 


0 75 0 78 

075 090 


(i:?) S 


0 50 0 58 

067 071* 

0 75 0 79* 

0 75 0 75 

0 75 0 86 


S' 

/5. 0\ 0 75 0 86 

[a. 5} 

-5^ 0 75 0 95 

0\ 0 60 0 63« 

\0. 1/ 0 80 0 73* 

( S, — 5\ 0 60 0 64 

5) 

/ to, — 10\ 0 60 0 69 

10. loj 

/ 5. -s\ 065 075 

5. 5j 0 75 0 93 


Payoffs were 
known step* 
functions of the 
number of correct 
responses 


Six pairs of 
alternatives were 
interleaved for a 
total of 360 trials. 
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Table 11 {continued) 

Experimenter 


Size of No of „ , 

Group Trials Payoff Matrix 


Myers et al 
(1963) 



-1) 

0 60 

0 65 


-1) 

0 70 

0 87 

(-!: 

-!) 

0 80 

0 93 

10 

— 10 

-12) 

0 60 

0 71 

10. 
— 10 

-10\ 

loj 

0 70 

0 87 

10 

-10, 

-10\ 

lo) 

0 80 

0 95 


• Definitely appears not to be asymptotic 

intersects the 0 and 1 lines somewhat before reaches 0 
It really is straight cannot be line and its 

now available It is also clear that matrix (see Edwards, 1956, 

crossing point are functions of the 

Galanter & Smith, 1958, and experiments bears a few words. 

One of the Suppes and A‘k'"“^^’^,e„ent stimulus-sampling model 
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described in Sec 7 3 There were four levers, each having a fixed proba- 
bility of a one cent reward (the probabilities were 0 2, 0 4, 0 6, and 0 8) 
On each trial, the subject was told which two to select from Each of the 
SIX pairs was presented sixty times, and in the analysis these sets of trials 
were dealt with separately as if they were independent of the others 
among which they had been interleaved Thus, in contrast to the other 
experiments of this section, this one involved the interleaving of several 
presentation sets These data, including some of their sequential features, 
were analyzed in considerable detail by Suppes and Atkinson in terms of 
the stimulus sampling model, and they concluded that the correspondence 
between theory and data is not good As far as the asymptotic results are 
concerned, the mean learning curves suggest that the subjects were not 
stabilized at the end of 60 trials On the other hand, the fit of the observing- 
response model described in Sec 7 3 to these data is much better 
A PREFERENCE EXPERIMENT Lucc and Shipley (1962) attempted to 
test the prediction of the decomposition assumption and the choice 
axiom that, under certain conditions on the payoff matrix, the plot of/J« 
versus is a monotone increasing step function They employed a design 
in which several different presentations were interleaved over trials and 
m which the subjects had complete information about the mechanism 
generating the chance events 

On each trial, the subject was presented with a card of the form 


Event 


T^ii 

■Uzi 


and™ toM t The outcomes were pomts, 

shte of rs2M r t determined the subject's 

C ^or one se T “"'y '’y" transforma- 

P. v=r!n ° ,S a ; ten f '""l^^htres needed to prove that 

not sat,l? these meqnahhes were 

as m the Mosteller ^TNomcTlOsn*’^ tumbling five dice in a wire cage, 
were ranked into hands much ^ ^yenment The possible outcomes 
ordered list of the 252 hn h a ®^ch subject was given an 

outcome'ltLx'LfcoXTer:^*^^^ 

probability range, which^was selert ^ spanned a 0 2 

g . wnich was selected on the basis of some preliminary 
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runs Each of the 90 event-matnx combinations 

successive blocks of 90 trials for a total o 50 presentations each 

the experiment Within each 90 trial block, the order 

The five student subjects were run as a group by a single experimenter, 

but they received independent presentations discontinuous 

Fortwoofthesubjects,theplotsofp«versusnares.mpled,sconti^^^^ 

functions from 0 to 1 The discontinuity for one subjec ^ 

the rational breakpoint according “ “ for the other three 

cruenon, and for the other it was a 0 _Tte data 

subjects are shown m Fig 12 Altnoug ue-rvatinns when we 

maLmatical function might nnderly .“fsec 7 3 (see 

compare them with the seven types o p functions of the 

Fig 8,p 376), they seem most consistent 

decomposition choice model Luc ..„,..eaus' might have resulted 
techniques to decide whether the observ p concluded 

from Lomial variability and a continuous ogive, and they 

that this was most unlikely nrobability prediction 

It ,s evident that these results and Pg^rdiirerLes that 

experiments are not particularly comp of the same presenta- 

might contribute to the groups of subjects 

tion versus interleaved presentation nf asymptotic stability m the 

versus individual plots, and a possible lack of asymp 
preference experiment 


8 4 Probabilistic Expected utility Models 


..tmity It will be recalled 
RANDOM VERSUS STRICT 5),mvcd that if a uhoicc IS 

that Becker, DcGrool, and M“'”*“u,„uUernatives plus the average of 
made from a set consisting ofm ^ u„h probib.litj 0 when 

these m, then the average alternative “"'"^""fmiluy 

the choice probabiht'a* ® ‘ ^ ,he „n,t 

With nrnHnbililV !/("» + IV.. i- « Inter nP«r (1963b) incsc 


ithors performed I „nccmin alternatives - , 

:cc..cd no monc> d.ffcrcncc. m bchawor. ih resu 

;incc there n 

ombined 
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Event probability 
Subject 2 




Estimated choice probability 
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Subiect I 



Subject 3 



(1962, p 45) 
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There is little doubt that the random expected utility model is wrong 
60 of the 62 subjects chose the average alternative at least once. Moreover, 
the data also reject the strict expected utility model, at least for some 
subjects That model predicts a binomial distribution with p === i 
Table 12 compares the observed frequency distribution of the 62 subjects 
with the predicted one (using the normal approximation to the binomial 
distribution) It is evident that the discrepancy is larger than can be 
expected by chance Becker et al argue that one must conclude that at 
least 18% of the population fail to satisfy this model 

Table 12 N umbers of Subjects Selecting the Average 
Alternative (see text) 

Number of Choices m 25 Number of Subjects 

^lals of Average Alternative Observed Predicted 

18 2 04 

5 2 88 

2 5 06 

H 775 


^4 

5 

6 
7 



T = 2) have both suggested 

and m m'irX'rjhf ■" ^ome sense very s.m.lar to y 

recalled*^ is not what ”“"6 This, it will be 

others, were true Becker DcGrX^''* 'h'm 

experimental test of this hypolhesis ’ As i^UiT f" 

o < 6 < c < dbe money cufcomesand let " 1 !!‘ 

and y' = <6, h c. he the « lei a; _ (a, a, d, d), y = (c, c, b, b), 

has probability } of occurring Note that“"''H 

that the pure outcomes o. !t and 1 / are similar in the sense 

they arc associated with difTcre Ptobabilities, although 

ateo with difTcrent events Four choice sets were studied 

Pocusmgonlhtchoilrj; ^ 

' and (x,y,„). ’ ' 'nteresting comparisons 

I (“'•I') and 

a and (i,y,y) 

(».!/') and J 
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Each of 62 subjects was presented with 25 choice sets of each of four 
types, these were generated by different selections f”-- 
Let « (i if denote the number of times m theyth comparison that subject ! 

chset^rthel-elementsetandnotfrom 

let n 30 .;) denote the number of times that he 

element set and not from the two-element one i, A + n <i ;)] 

null hypothesis is correct, the -"f ™ 

IS binomially distributed with p \ F uc<.rv6*H data orovide 

pair separately. Becker et al concluded that “the ™d dma provid^ 

no reason to doubt that Debreu’s comments ™ 

population ” However. « ^j^s'^a^d tver half were less than 10), and so 
are very small (none exceeded Iv, ana 

the tests are not very powerful ^ (subiects) or 

To increase the power j and look at the statistic (, 

overy (comparison sets). When we do the latter and looK 

tn this case = [mW - -.i(!)l/A(-) + "M- where md) -1 «.('■;)> 

find that t exceeds 2 for 1 1 subjects, '®^'‘*®®" 7mueto^have^ sample stzes 
than -2 for 7 The remaining 10 ^“hjf s o 'nu' subjc'cts. and 
less than 10 Thus regularity seems to be viol t d by 
the Debreu-Savage hypothesis cannot be rejec 

When we sum over subjects, the resul g ^ This suggests 

+2 55, and +0 54. respectively. direct from 2 and 3 

that comparisons 1 and 4 may twice m the three-element 

Note that the identical ‘*'7‘'"‘“''?“tlv there is some tendency to distin- 
sets of comparisons 1 and 4, evidently Ih^e is_ 2 3 

1 < .... _£• aU., AAri>C«>nt: 


sets of comparisons 1 and 4, evi en y similarity, as m 2 and 

guish identity of the presentation ^ „ the linear programming 

Itrong expected utility By “‘“J.ng t 

methods of Davidson, Suppes, and f j the sense of Sec 

to data a strong or Fechnerian “P'f ^ ”ble maximum-hkelihood 

7 1 In order to develop a Action, he assumed the 

estimation procedure to determine the ut,hVW^^_^_^ ^ 

followmg specific form of.he~^ if ^ 


= 

" (I 


^[n{T) - l!(!!)l - J „p [«(») - l'(ie)l ‘f ^ well 
.r-u A was also motivated by its dhiWy ® probabilities 

This choice for « * .jj, pp^es of the estimated choice p 

Mosteller and Nogee ^ expected utility dife'^"” „ p I, near 

plotted as a fu"';" summarize Dolbear’s "’"''“f.lbood® function 

We do not attempt to ^ maximum-l.kel.hoon^ 

programming "1= are referred to his dissertation for 

of the utilities, readers ai 
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He applied the strong expected utility model to data from an experiment 
m which ten subjects were run, five of whom were graduate students and 
five of whom were at an Air Force language school at Yale. Each subject 
participated in three sessions During each session the subject was 
presented with 100 pairs of options, each of which had two outcomes, and 
he was required to choose one from each pair One of these pairs was 
selected at random from each session and they were run off at the end of the 
experiment The smaller outcome probability was either A, J, or i The 
outcomes ranged from losing SI SO to winning S9 75, which is a consider- 
outci^er experiments with monetary 

of Dolbear compared the number 

number expected utility model with the 

The remainino ° model and by the minimax decision rule. 

Aterrerv re b .““b ‘he utility function 

r tR ^ subjects, the strong utility model correctly predicted 79 3 y 

actuand moS have ,b " hypothesis that the strong utility and 
S=rwaret ^hihty, he found that the 

level for a fifth suhieei “8"'flaance level for four subjects, at the 0 05 

Usmgtrsame t, ^heT' “ 

‘he minimax model at the 0 Ol’ level rs^oCloTul^leTs" 


References 

Abdson R M , & Bradley, R A A i 

Biometrics, 1954 10 , 487-W ^ factorial with paired comparisons 

Adams E W^SmyeyTserXELily « ™fc™„cr 1964, 2, press 

Adamt'i 

, >«« "e»«re».™ Unpublished manuscript, 

postulats et axiomes dc 1 cEamer°E^ rationnel devant le risque critique des 
Anderson N H ,& Whalen, R R ^‘i =“-546 

•wo-choiee probability leamin, situa ™ , ‘"'‘l'"'"" “"'1 sequential effects in a 
a™-, w H 

A™|P|w I Eroa A,948,58,,-,0 

joiwellate Oxford Economics Papers, 195 i. 



403 

REFERENCES 

V, c ir, the theorv of choice m risk taking situations 
Arrow, K J Alternative approaches to the theory 

Arrl^KJ Ut.lmes, a.mudes, cho^ a « 

Atkinson, R C The observing response in oiscri 

1961,62,253-262 strone and weak conditioning 

Atkinson, R C Choice behavior and mon^^ p y haihemalical methods m 

In loan H Criswell, H Solomon and P Suppes ^3-34 

small grot, p processes d<ev.l9m. 

Audley, R 3 A stochastic model for indiviau 

67, 1-15 „i,teness axiom Ecotameinca, 1962, 

Aumann, R J Utility theory without the co p 

30, 445^62 _,„,,res of subjective probability and 

Becker, G M Decision making 

utility Psychol 2iea , 1962, 69, 136-148 models of choice behavior 

Becker; G M,DeGroot,M H , & Marschak, J Stoebas 

Be6i.rir.reilSci,1963,8,41-55(a) j ^n experimental study of some 

Becker, G M , DeGroot, M H , & ^ar^ak 
stochastic models for wagers 1 J probabilities of choices among very 

Becker, G M, DeGroot M Martha 

similar objects * MarUliak J Measuring utility by a single r p 

Becker, G M, DeGroot, M ■ 226-232 

sequential method , 'a tlaulaiton London, 1789 „f.„dfmlae 

Bentham,! The prmc, pies of morals Ccinenfcr,. ocodemtee 

Bernoulli, D Specimen theoriae ,j 5 J75.I92 (Trans by L. So 

scientiarum imperiates petropo tta^o , Society, 1W8 

£co/rofne/rica, 1954, 22, 23-3 ) Providence American M NevvYork 

Birkboff, G Lomce •>'><’'yJT^^JZme,oois,aUsncaldec,s,ons New Yo 
Blackwell, D , & Girshick, M A Theory JS 

Wjley, 1954 , ^ orderings and stochastic contributions 

Block.H D ,& Marschak, J Ran ^ Madow. & H ^QfiO^Vp 97-132- 

In I 01kin,S "sTanford Stanford Unner 

to probabiltt) ond statistic In R R Bush A ._,q pp {09-124 

Bowfr, G H Choicc-point Stanford Uruver Ublcs for 

mathematical learning theory block design " 

Bradley, R A Rank ana ysi /iMj/nerriAo. 1954, ’ -.uieness of the model 

Br^dRy'f A t^ni-ti^s^ "Pjj^rof ccmparimn, Ba-rrd. 

results on estimation and po „romn1etc block designs I The 

1955,42,450-470 Rank '!^3ii.j45 

Bradley. R A . *■ ^''^Aatnons Bioi"""*" condilioning la tcbtion 

method of paired comtxitoon ;. Rale of «■«' 

Baikc C J .Estes- " • %La>/. lOM.d-’. IS’"'*'- scw7oil. Wiler.l’iE 

Cantor. G Be«tr3i^ ^ 
ms. s**, 481-51- 



4'^4 PREFERENCE, OTtLITY, AND SURJFCTIVP rROHAniLlTY 

Chipman, J S Stochastic choice and subjcctixc probability, (abstract), Cconometrlca, 
1958,26,613 /.v /. 

Chipman, J S The foundations of utility Ecommetnea, I960, 28, 193-224 (a) Also 
mR D Lucc.R R Dush,&E Galanlcr(Eds), Readin:;sinmathcntatica!ps\cholty^y, 
Vol 11 New York Wiley, 1964 Pp 419-450 ‘ 

Chipman, J S Stochastic choice and subjective probability In Dorothy Wilincr 

S. Vol I New York- Pergamon Press. 1960 

Pp 70-95 (b) 

Cohen. J Chance, skill and luck London Penguin 1960 

or measurement Psychol Pee. 
.n o^o^oucalpsychl^y. Vo, Il‘'“N;w%L”“w;if/,9M 

InR M Thrall, C H 

69„86 ’ (Eds ), Daemon Proemfi New York: Wiley, 1954 Pp 

'^°TOrh''/ Psychol .“l^t ss.TT'"'’' psychological measure- 

In”c"’w Churchmarrp'’'RlioS'(Sr)‘'Mr'”'‘‘'“" 

New York Wiley, 1959 Pp 221-232 ^ ^ definitions and theories 

Coombs, C H . & Beardslee D r . 

Thrall, C H Coombs, & R L Dam “ 

1954 Pp 255-286 4>eci5ion processes New York: Wiley, 

foe the chalys°'^”fTrcf=remi* cro"« Ld stmtoM 'rt" “f comparativejudgment 
26, 165-171 S'fnilarities data Pj^cAome/ri/to, 1961, 

Coombs, C H . & Komnnfei ^ C 

Amer J Psychol , 1958, 71, 383-389 utility of money through decisions 

° probability 
cop..,o„i„g phttie'„“"'4 ’toTs" 56 *“■ •’'''"-choice yerbal- 

“ct"c“harfhrar.%^'^..f~ 

New York Wiley, 1959 pp 233-269 definitions and theories 

utility Econometr^a, 1956, 24 subjective probability and 

avidson, D , Suppes, P & Siepel 

^Stanford Stanford Univer Press' 1957 experimental approach 

t"u ?hraU;rH"S;o"bf h"'-""' 

Wiley, 1954 Pp 159-166 ’ ) ^ecwion Procewei New York 



405 

ZZrLwofR D Ruc=,Ind.v.dua,cho.ce behave, a .beerebea. a„al,a,s 

for Research m Management Science, Umver _ources subiectives Ann Inst 
de Finetti, B La prevision ses lois cgiques H E SmaWer 

Poincare, 1937, 7, 1-68 English translaUon m H E Kyburg,^J^r^, 

(Eds) Studies m snbjedwe probabihly , r ,L„es of probability In 

de Finetti, B Recent suggestions f ol mutHe.uticnl 

1 Neyman (Ed), Pnceedtngs of the second Berkley S)^^ 
statisues and probability Berkeley ° , septal measurement of utility 

DeGroot, M H Some comments on the exp 

Behavioral Sci , 1963, 8 , 146-149 „ . , ^ p,, rhnlopische Forschung, 1931, 15, 

Dembo,T Dcr Argcr als dynamtsches Problem Fsychologncne 

Deutsch,M Trust and suspicion J Ifnlm sac Pjyc/io/, I960, 

D=utsch,M Trust, trustworthiness, and the P scale 

61, 138-140 . „„„,iamtv— an experimental study 

Dolbear, F T, Jr Individual choice under uncertainty P 

Yafe Economic Essays, 1963, . Keean Paul 1881 

l£r/p^b^d;;;;rr:n';rrgambhng .01 . j psyeho,. 1953. 66, 349- 

Edl^ds,W probability preferencesamongbetswithdidcringexpectcd values . - 

.miX’'^^;^^pro..hiyptnreienees Anter d ’ 

Edrards!w Variance preferences in 

g=;”, »■ ^ 

201-214 ountandinformationasdeterminersofsequential 

Edwards, W Reward probability,amoo . 5^ 52 ,77-188 39J (a) 

two-allernative decisions J "P / ex> S^^ir! 

Edwards,W Probability learning ml p r Famsvsorth. 

Edwards, W Behavioral decision thwiy^ In^P Reviews, Inc , 19 

Q McNcmar (Eds ), 4<nrt Fev of y t, j Rev 1962.69, 

Hd^v^rS^^v'^SUtiveprobabildiesi..^"^^ 

109-135 New York "’iley. 19M ^ ^b,,,„es m >«bal 

Kamn. Ik 

conditioning ^^’T. _,^e1 for choice behavior In ^ Stin^ord 

Estes, W K A random wa k m'^;';^';^,,,,, .aelnl stlentet. 

P Suppes (Eds). Afu'^e™ p. 265-276 a , silosuon m ‘cP^ 

Stanford Unircr Pre«. I960 PP „ „,bal eoniliuon.nf situs 

Estes. \V K . i. Slraughan. J pj,rM , 1954. 47. — „R„d S-ar'et J 

..o 

rarR-n-odelforooleredow-scalingbyeom.^ 

Lriko. 1959.24.157-165 



4o6 


PREFFRFNCE, UTILITY, AND SUnjFCTUF I’ROnAnillTY 


Fisher, I The nature of capital ond income New York Macmillin Co , 1906 
Ford, L R, Jr Solution of a ranking problem from binary comparisons Amer. 

Math Mon , Herbert Ellsworth Slaught Memonal Papers, 1957, 64, 28-33 
Friedman, M P , Burke, C J , Cole, M , Keller, L . Millward, R B , i Estes, W K 
Two choice behavior under extended training with shifting probabilities of reinforce- 
ment In R C Atkinson (Ed), Studies in mathematical psycholoey Stanford: 
Stanford Univer Press, 1964 Pp 250-316 

Galanter, E H , <5: Smith, WAS Some experiments on a simple thoucht problem 
/ Pj^c/io/, 1958, 71, 359-366 * 

Gardner, R A Probability-learning with (wo and three choices Amer J Psychol, 
1957,70,174-185 ■' 

°'l9?r26 Threshold m choice and Ihc theory of demand rcommelrica, 

Gerlach, Muriel W Intmal mrasuremeiu of sobjecthe mofnlfuJes B/r/i subliminal 
aijfsrnces Ph D dissertation Stanford Stanford Univer , 1957 

’^'’®“““''““°'"'«'“'°fr>onsense syllables J Cener/ePsye/ro/, 1928,35, 

p;£:'i955,“t™;6 

MDecutmn'sVn^a'sr ^ ^ ■* ** Requisition and cxtinelion of Verbal 

exjKtations m a situation analogous to conditioning J ,sp Payc/to/, 1951, 42, 

° m9,'6^, “90 °9‘'4‘‘’ ‘■y Antartcan horse raee bettors Amor J Ps)cbcl , 

“srpoS^a^t r.fum°pt ““r;r:Str“^ 

rSwOT of Thurstone-s learning function P^ellrr*,,, 1953, 

of b,nar7 symborTcay, PryS'°r9f3,‘4VlS^7r' °f a 'a"'!""' 

Hausner, M Multidimensional utilities In R M tv, n 

Her„,‘f;>:t MlTrTTn ' 

metrica, 1953, 21, 291-297 omatic approach to measurable utility Econo- 

^ of utility J “ higher-ordered metric scale 

^ comes J /’crjo/,a/,ryn953' 2^, probability and desirability of out- 

^ Psychol, \95^ 71 ^ 152-163*^' concepts of discrimination and preference Amer J 

169-187 ° probabilities and sequences Amnis of Math , 1941, 42, 



REFERENCES 


407 

Krar,C H.Pra..,J W,&S=«.A ,„.u,..ve prcbab.hty on flnUe s=. A.n 

KytrH“E':?r’:rs™’birH’E (Hds) S.,e,..s.^je..epro^aM.:y New 

L:ct„.''H:^'iTdwards, W Supp,en.e„.a,y eepc. un.earn.ng .be gan,b,ePa 

Loirs' D of d.sc.,n„na.,on — 

Lu«: R*D ' A probab.bs.,e theory of u.,b.y "Z- S “’rfw^y, 1M9 
Luce.R D Indwidual choice behmior . L j a„ow. S Karim, & P 

Luce, R D Response latencies ‘ ]959 Stanford Stanford 

Suppes (Eds ), Malhemoucal methods m the socto! sciences. 

ZZr TlZtfXVomes and decisions tnttoditelion ond cnuco, snrcey 

Lu"rrS?™:rr';LuanLs conpm. o 

LuSitrr^lXn^ralLt a ;idiZr of cio^ra.ive behavior / Confiict 

J?«o/«nort, I960. 4, 426-430 lAndon Macmillan, 1958 

Majumdar,T Themensnremenlofmli y ^ jna privilege' on the slated 

Ma'iks,RoseW The elfect of probab. ity, d«.rab.l y._^jj , ^ 

expectations of children J and measurable utility Econo- 

MarLhak, J Rational behavior, uncertain prospects 

melrica, 1950, 18, 1 1 1-141 „„,„y indicators In K J Arro , 

Marschak, J B'"4iy<'’Of 'O™ r9^„7,„n.,en; methods in the social sciences, 

iZZ ZJt7Lisi f^;j:to?guSna..erna..ves .m d Esycho, . 

McGlo.hlm,W H Stability of choices among u 

1956,69,604-615 « nf relations Free //or dcarf ,1951,37, 

Menger.K Probabilistic theories of relalio^ Thrall. C H Coombs. & R L 

Milnor,J Gamesagainstnature^ R 155 ^ ’’P ‘"^^omparison choices 

(Eds ), 'Seditions for triads of paired comp 

Morrison, H w tan . i r nolit Econ , 

Psychometnka, 1963. 28 3^^ n,ental measurement of uti . y P 

Mostcller, F , & Nogee, P .h-nrem J Econ 

1931,59,371-404 „„ic on Chipman's representative 

”rro:-,l"961,Ll7tl\ ^ choitre behavior and regard Slrucmre . 4, a, 4 

Te4e^^9M,L^S ,.,snydam,M M K! 

Ld losses and event probability ^ 


66, 521-522 „ „ person games 

Nash,J F Equilibrium points m pe 

tA aR-il9 _ . of nonreinic 


J, librium points - mimbcr of altemalive 

36,’48-49 ■ „f type of nonreinforeemen a d ,, 5 ^^ ^ 209- 

Neimark, Edith D ° odtSning sitna.ions J exp r > 

responses in two 



PRrFTRFNCC, UTII IT^ , AND SURJPCTIVF PROnARILlTY 

Ncimark, Edith D , & Shuford, E Comparison of predictions and estimates in a 
probability learning situation J exp 1959, 57, 294-298 

Newman, P , & Read, R Representation problems for preference ordering J Ccon 
Bchaiior, 1961, 1, 149-169 ° 

Nicks, D C Prediction of sequential tno-choice decisions from e\cnl runs J exp 
Psychol , 1959, 57, 105-114 ^ 

Papandreou, A G , Saucriender, O H.Dro»nlcc,0 H . Ilureiaz, L , A FranUm, W 

In Ccommks.rnT. 

‘’“m S’' <■»» unn lnlro,l,i:hnt uUa iclrmn socMr 

Milan. Italy Socicta Editricc Ubraria. 1906 

Gmndh^en ,ln„ n,o,„ M„„ns 

Phv/ V ^ dK Stalistischcn Inslituls dcr Unucrsitat Wien New Feilre Nr 1, 
Physica Verlag Wurzburg 1959(a) ^ 

'"'lom/i’e! n ’"'““remenl-applicalions to ulility hlaral Research 

1964 Pp'^Ljo* W'l'y. 

"■ wS; “c e°r;^'T:r,9t 

'187-201 pmblmg decisions Ps)ehol TJn . 1962, 69, 

a^iffa, H , Sc Schlairer, R Applied staumealeleeis, on Iheor) Boslon Harvard Univer , 

and hher logical essays ''new'Sk ' Harco fmndauons of nialhemalles 

&E aalanter(Eds) Ho^ioct “ ‘-uce, R R Rush, 

1963 Pp 493-579 f'"alhemalieal psychology, II New York Wilcy, 

Roytn!'H'’L"7^s*"l^^^ Wiley, 1961 

^416 pa Illy matching J exp , 1959, 57, 413- 

Stanford StanforVunwc?', P^^f^rences Ph D dissertation 

s‘"'dV’4 ' 1951,46,55-67 

pers^ non zero sum games / Conflict descriptive aspects of two* 

Sh^°S'’ a”*’ “.'ll3-12r"‘‘‘“'°"“' “>*“ “f •h'ories of measurement / Symbaha 

Shack e,G L S B,pe„ii„™ 

Shackl G L S Vncerlainlo 

1,55 ynecoiuna^^ Cambridge Cambridge Univer Press, 



409 

REFERENCES 

Shuford, E H /f comparnon of 

eveou Rep No 20 . The Psychomemc Lab . Univer ^orth 

Shuford, E H Percentage estimation of P''°P°'^'°"/A^L 

exposure time, and task J exp Psycho! ,19 ’ ’ . ,g „„„ chsmbutions 

Shuford. E H Applwouon, of Boyeswn procedures based on 

Rep No 31, The Psychometric Lab, Univer “f^orth ji,s effect of stmoliis 

Shufo'^rd.E H,&Wiesen,R A 

distribution and exposure time Rep No 23, Th y gy 

S.e“A method for obtaining an ordered metric scate Psyehontetr.ha, 1956. 21, 

Siegel, S Level of aspiration and decision “ Ja'bie Mate’ behavior 

Siegel, S Theoretical models of choice and j 559 ^ 24, 303-316 

in the two-choice uncertain outcome situation sy j of reinforcement 

Siegel, S Decision making and learning under varying conditions 

Ann N Y Acad Sc , 1961, 89, 766-783 . York 

, o e. T? leei, T P Rarpamint 


Siegel, s Decision maKuig o - 

Ann N Y Acad Sc , 1961, 89, 760-783 New York 

Siegel, S ,& Fouraker, L E Bargaining and g p 

McGraw-Hill. I960 . . choice uncertau 


Siegel, S , & EouraKer, l c — o a 

McGraw-Hill. I960 behavior m a two choice uncertain 

Siegel. S , & Goldstein, D A Dec‘Sion ^kmg 

outcome situation J and strategy behavior a general 

Siegel. S . & MoMKhael, Julia E , of Psychology. The Pennsylvania 

model for repeated choices Res Bull r 

State Univer , 1960 Warcaw Poland, 1958 

Sierpmski.W Cardinal and Ordinal Numbers > 1955, 69,99- 

Simon, H A A behavioral model of rational choice C»<. 

a ihe structure of the environment Psychol Rev , 
Simon, H A Rational choice and the str 

1956,63, 129-138 Rauo scales and category scales for a dozen percept 

Stevens, S S , & Galantcr, ^ ^ , in^09 Pm- 

continua J exp Psychol . 1957, 54, ” decision making 

Suppes, P The role of subjective 1 %,„„icol stonsucs 

lidtngs of the third Berkeley V”/”’”"” "j; ,956. 5, 61-73 Also m R D Lu^ 

1954-1955, Berkeley ° mothemolieol psychology 

R R Bush.&E Galanlcr(Eds).KTOil.»yx 

York Wiley, 1964 Pp 503-515 Van Nosttand, 1957 

Suppes, P iZodnetwn to logic N** 1961. 29, 18^202 
Su'p'pes, P Bebaviorisnc .node, s for mo, ..person .nteroen 

Suppes, P , & Atkinson, R measurement 

Stanford Stanford Univer Press. j ,he experimen 

Suppes. P , & Walsh, Karo, A non b^r m 

of utility ■Be*"''’™'-'"’,' ’ omatiialion of utility based 
Suppes, P .& Wind, Muriel An ,955 . ,, 259-270 ^ r Bush, 

utility differences measurement theory In R » ^ Wiley. 

Suppes, P , & Zm„d, >^^^,ZTof Mothemnneo, PsyeholoSI- 

E Galanter(Eds) Honao ,„,n 3 469-193 

1963. pp '-’®_„,,mrnmg function J Gen '”5/, 0. 237-253 

Thurstone.LL ‘ diclfon of choice PiT''’'’"'"" “dhod of game Jopon J 

TodaTl'’ Measurement of mtoilive-probabiUly by 

°Fs^ehol, 1951.2^^9-40 



^10 PREFiRENCr, UTILITY, AN» SUtlJECTIM I'ROnAntLlTY 

Toda, M Subjective inference \s objective inference of sequential dependencies 
Japan rs)chol Res , 1958, 5, 1-20 

Tornqvist, L A model for stochastic <leasion making Cowles Commission Discussion 
Paper, Economics 2100, 1954 (duplicated) 

Uzawa, H Prcrcrcncc in rational choice in the theory of consumption In K J Arrow, 
S Karlin, & P Suppes (Eds ), Afalhemaiical Afethmls In the Social Sciences, 1959 
Stanford Stanford Univcr Press, I960 Pp 129-148 
Valavanis Vail, S A stochastic motlel for utilities Univcr of Michigan, 1957 
(duplicated) 

von Neumann, J Zur Thcone der Gcscllschaftsspielc Math Annalen, 1928, 100, 
295-320 English translation in A W Tucker & R D Luce (Eds ), Contributions 
to the theory of games, IV Princeton Princeton Univer Press, 1959 Pp 13-42 
von Neumann, J , & Morgenstern, O Theory of gomes and economic behactor 
Princeton Princeton Univer Press. 1944, 1947, 1953 
Wiener, N A new theory of measurement A study in (he logic of mathematics 
Proc of the London Math Soc , 1921, 19, 181-205 
Wiesen, R A Bayes estimation of proportions The effects of complete and partial 
feedback Rep No 32. The Psychology Lab , Univer of North Carolina, 1962 
Wiesen, R A , & Shuford, E H Bayes strategies os adopthe behanor Rep No 30, 
The Psychology Lab , Univer of North Carolina, 1961 
Wold, H , & Jureen, L Oemand analysis, a study In econometrics NewYork' Wiley, 



20 


Stochastic Processes 

J. Laurie Snell 
Dartmouth College 



Contents 


1 Foundations 

1 i Definition of a stochastic process, 414 
1 2 Classification of stochastic processes, 417 
13 Some probability concepts, 421 

1 4 Examples and applications, 424 

2 Independent Processes ^ 

2 1 Limit theorems, 428 

2 2 Properties of sample sequences, 432 

2 3 Examples and special problems, 436 

3 Markov Processes ^ 

3 1 Classification of slates for finite chains, 441 
3 2 Transient behavior for finite chains, 442 

3 3 Ergodic behavior for finite chains, 447 
3 4 Limit theorems for ergodic chains, 449 
3 5 Applications and examples of finite chains, 452 
3 6 Denumerable chains with discrete lime, 458 
3 7 Recurrent denumerable chains, 460 
3 8 Applications and examples of denumerable chains, 462 
3 9 Continuous time Markov chains with a finite number of 
states, 464 

3 10 Continuous time Markov chains with a denumerable 
number of states, 467 

3 11 Applications and examples of continuous-time chains, 471 


4 Chains of Infinite Order 473 

4 1 An application, 474 

5 Martingales 475 

6 Stationary Processes 478 

6 1 Examples and applications, 479 

7 Concluding Remarks 482 

References 4 g 4 



Stochastic Processes 


A stochastic process in its ''°™ ^^whicHc^^^^ m some way 

to describe a sequence of experiments ea , , , ^f simple games of 

on chance The subject had its origin in 

chance played sequentially, but it is studies, for example, 

in almost everv field of knowledge I p y » tv,a motion of a 


cimiicc payed sequentially, but it is „ “"judies, for example, 

in almost every field of knowledge I P y ’ ^r the motion of a 

the number of emissions from a ™ ^ In genetics, the frequency of 

particle moving randomly in a liquid g through successive 

different types of genes in a P°P“ “ ^ ju engineering, the study of 

generations determines a stochastic p ,5 has led to important 

queues or waiting lines as a processes have been used 

improvements in services Recently, ,„j,vidual5 faced with making a 

m psychology to descnbe ‘h'^‘'°"®ubutafew of the diverse applications 
sequence of Simple decisions These theorv 

that have been made of stochastic pr . _ jg necessary to have a 
To apply fruitfully tended period of time and that is 

phenomenon that takes place ove ^ain ecnetal assumptions of 

understood sufficiently well to 7 ^' f„cess This is a difficu U 
probabilistic nature about the evolut P behavioral scientist 

task in the very complex P*''"”"’™ nlished by restricting the type o 
In learning theory this was o«o'"P'‘f 

learning studied to very simple aduation „„„ ^ mg 

AlthLgh probability ideas the areas of learning 

studied in psychology, it seems to me 77/ut example, to langua£S 
theory and applications of sig^iicantly 

have^stochaVtic processes b'™ “f/hcmions, see, for example. Luce 

articles have been written of this Handbook 

(I960) and Chapters 9, 10, and >3 ^ ,u„e idea of the types 

. .n.s chanter simply to gi 



stochastic processes inav . ^ , although many ^ because 

been solved It must be reato^ tna number arose 

arose naturally “PPjIJ'ag for an interesting P^l'to a°subject already 

a mathematician 

his desire to giv 5 , ye to separate out 'b' P^ . fore, to consincc 

studied It IS no p y. made, pjjehology 

something m 7 ' has immediate application P 

the reader that eacn 
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At the end of the discussion of each section some indication has been 
given of one or more applications and special processes that have been 
found important m applied probability 

In general, emphasis will be placed on giving the kind of results available 
concerning the various types of processes, and few proofs will be given 
There are now excellent treatises on stochastic processes The books of 
Doob (1953) and Loeve (1960) are the most complete books available on 
stochastic processes, but they presuppose a great deal of mathematical 
background The books of Rosenblatt (1962) and Parzen (1962) require 
less mathematics and give a good general introduction to the theory of 
stochastic processes The best introduction to stochastic processes, as 
well as to probability in general, is to be found m the book of Feller 
(1957) This book presupposes very little mathematics, includes numerous 
applications, and gives the reader a remarkable insight into the theory and 
applications of probability theory 


1 FOUNDATIONS 

1 1 Deanition of a Stochastic Process 

Mpenment that is the result of a 
terms o T fte , can be described in 

eacTexLnment “th The set A is the same for 

be repSerrh„ history for the entire experiment may 

represents the ‘he element c, of i 

sequen^^^ ‘ ‘''=^'h ‘nal For example in a 

sequence ol tosses of a com one usuallv take's >4 — /t/ -ri a^* i 

point CO would be to = (ff /f r r r Z r 1 ^ 

convenient at times to think nf ihJl . ^ ’ sequences to It is 

single expenment in whirii u sequence of observations as a 

Thf sequen™ I are of e„ n “f ‘he set £2 

the sample apace By an evenrl we m^*”' ‘he space. £ 2 . 

D We say that the event r r l ” subset of the sample space 

be an element of the siri Fo.T to 

of times, the event "heads on the fi T t* ^ tossed a sequence 

that have 22 as their 

every toss” consists of the set with “P 

(H. 22.22, 22, ) Our anil. °"‘T “t' element »i = 

Of course, it would be mns, /'"®;; P‘°hebiht,es to events 

would be most desirable to assign probabilities to all 
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events, that ts, to ah subsets of the sample sp-e Th.s^ns out not m be 
poss.b,e, at least not tf we .ant ^ wh.^ch .s 

Simple and intuitive properties This ^^unhilities to a wide class 

not chsenssed here It is possible to assign P™"es m 
of events that seem to include aU evenU 

The probabilities for these events at denote by X, the out- 

basic probabilities that are now descri descnbe X, as a function 

come of theyth experiment. ^ .n,, )• 

defined on the set of all sequences £1 a P picks out the y th 

AT, has the value «„ that is. X,(a,) = Thus X, s^P J P „„„ ealled 

client in the sequence The functions X„ ..a 

OrmrrThasic event ,r an cncn, a/t.c/nrin 

{ 01 1 (X.(oi) = a.) A (XeCa.) = nj A A (ir> -„)} __ 

The notation {a. 1 . } means ‘‘the ^ “/“.t „ outcomes 

A basic event is thus determined by our notation slightly 

It will be convenient from now on o .mph^ 

We shall write the event (oi ] X„W I 

our basic probabilities will be written as ^ ^ o )] 

PrKXr = oi) A (X. = oj A ,y „„ns 

It is sometimes more convenient o 
of conditional probabilities as o ^ ^ 

Frix„ = fln 1 (Xn-i ^ ^ the outcomes 

we denote the probability that the ”,ns,cad of ^ 

ofthefirstn - 1 expenments we e a,. Or. 

basic probabilities, we can equivalently spe y 

(I) Fr[X, = 0.1 ..(X o. J A A (X, = o.)) 

(ii) Pr[X„ = flr. 1 (Xn-i - . ^ set A of possible 

we Shan also wish W ^""“rf ""s^rdrnutrab.e se^^^ 

values for each trial is correspondence with ttic ‘"A^con5>** 

that can be put in . ,f however, the rmallj warn the 

in our procedure ■> of ,h.s line. »e shall norm 

whole real h« "^.".r.e number to wa 

probabilities for ' p^^arher. are not useful 
probabilities, as _ the form 

basie ^ y.) A (A'. ^ A . . . A ( V. ^ 

/ y:.n the intersa. of possible ouleomes 

for each Ap ^* • * * * 
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So far we have assumed that we are observing a sequence of trials 
We often want to consider situations in which the outcome can be 
observed at any instant of time in some interval of time 0 ^ r ^ T For 
example, we might be observing the blood pressure of an animal while it is 
undergoing some experiment If Tis the length of time of the experiment, 
we would hke our process to record a number for each time t in the 
interval 0 ^ r ^ T If we were actually recording this experiment, we 
might have an instrument that records the blood pressure in the form of a 
graph Mathematically, we think of a possible outcome for the entire 
experiment as a function oi = a((), where a(t) is the pressure recorded at 
time r Our sample space n in this situation is the collection of all 
possible functions a(l) with domain the interval [0, T] An event is a 
subset of these functions, and we are interested m the probability that the 

exnm'T functiOTs For 

urSl’„ continuously rises is the 

‘he subset of all continuous 
this 5 ituation"is con^ problem of assigning a measure to events in 
denumelfc more difficult than when the sample space is 

b.lU,es,u,te;,m^rrm^hos:irtl^“^^^^^^^ P™”- 

doma'in n ffiT mves'r ^ f«ncfon with 

any ul , 1 frem a S T, ' ''' P°'“ble outcomes at 
of th form '‘=■“‘"’”^‘>1= set then the basic probabilities are 


where A (X,. = A (JT,, = n„)], 

finite sequence o?nmes°L''tLTte'rral"or' b^''"" =‘"y 

~ from an interval of 

Which e^rwralU^memte^bT™”',^” continue indefinitely, in 
the outcome functioM'^°Fmcxamnr “P functions derived from 

outcome functions of a stochastic numerical 

defined by S. = A". + ^°'haslicpro«ss If we form the sums S., . 

fined on the space of sequences w, and ^ functions de- 


Fr|(S, . 


can find the basic probabilities 
a,) A (S, = 0,1 A A (S„ = a„)] 
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However, the functions S.. S,, . . . do not represent ^ 

nth trial in the onginal experimmt In ^ ^ 

sequence of real-valued functions X\% Aj, . • 
a way that we can find the basic probabilities 

PrKA", = fli) A (JTj = O2) A . A (Jlf„ = ojl. 

we shall say that Xt,. . the possible 

Similar generalization when time is 

outcomes are an interval nrocesses consist 

The mathematical '‘spK« “f Ihe ^ p„babiliSes of interesting 

primarily of developing methods to ob th^P ^ probabilities are 

events from the given basic proba ^formation about the phe- 

assigned by the scientist on the basis . , ^ oroccss is too general 

nomena being studied The most genera s . jgjjgj of stochastic 

to be studied fruitfully. Therefore “^am hmited ebsses ol 
processes have been studied, the^ c asses being X 

restrictions placed on the basic probabilities 


1 2 Classification of Stochastic Processes 


-e. rtf classification of stochastic 
We have already discussed j of the trials being observed, 

processes The first is according to thejiatu ^ ectmuous 

The first case considered is called dt expenment at a sequence 

time, the distinction is whether we interval of time 

of times or observe it continuously over ^so^ o„teom=s 

second classification is according o ontcomes form a finite or 

at any particular trme If 'he P°^ rf„„ere rpnee eapenment 
denumetably infinite set, we say that we TZ 

If the possible outcomes form an ■"'e'va expenments m v-b'ch th 

process It is sometimes also useful to consio to as 

outcomes are Forexample, the outcomes at each time m. 

be«s'ra%roupcfmd^^^ 

We sha!. e°ns;de^ Pe-h-'-^rconUn^s 

give some considc time-discrete space, 

continuous space, 

time-continuous spa ^ ^ gtve the definitions o detail 

For purposes °f .hjeemmen. on .hem m more 

types of PfC^^ time-discrete space processes 
We consider discrere 
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. IS an independent process 


Definition 2 A stochastic process A'l, X^, 
tf for any a^, a^, a„ 

= «„ I (^„-i = a„-.) A = a„.J A . . . A (X, = a,)l 

= Pr\X„ = a„]. 

Definition 3 A stochastic process Xj, . is a Markov process if 
for any Oy, Oj, , o„ 

= a„ I = a^i) A (3r„_j = a„_j) a A (A", = a,) 

= Pr[X„ = a„ I X„_y = a„.,]. 

Definition 4 A stochastic process Xy, Xy, . ts a stationary process tf 
for any a^, Oj, , and any h, 

■Pr[(Ai = ai) A (Xy = aj) A .A (X„ = a J] 

= M(A't+s = ai) A (Xj+j = as) A . A (X„+j = a„)] 

° f's?" ^ a"* • “ “ martingale 1 / for any 

-B[.5^„|(X^i = a..,)A(X._, = a„.s)A . A (X, = a,)] = a„_, 

edee'’onhl''on,n n Pr^ess is to say that the knowl- 

0/ he os dilTr “P '"=> ^“ve no influence 

Hroceram ltLm"n probabilities for such 

of the random vanabks'x’‘x*' ^ <he distribution of each 

functionp, defined^ ^ ’ “““ Py ^™S for each n the 

p,(a,) = p,\x, ^ 

ft" rZdr^aSsTx “ P-“- - '■’= °"= ■" which 

the functions p are the V * ® distribution, that is, 

we need only gwe a single specify this process completely 

priate rvhen'^vf;;^' ^“^ 1 " 7'"“^ " W™' 

oulcomes have no mfluence'^n future 0 ^ 0 ^"'’“'?" P"®''‘°“" 

Studied, having us origins m tii tcomes It was the first model 

gambler’s fortune when he makes questions about a 

and It IS the basis of classical statisbcs’'*'"'^' P’’*’'® 2“™°’ 

assumption tLtth^pM7ha’’s™rinflrra*”w" 

on the past, but when the last out "^' “how predictions to depend 
knowledge of any prevmuroMclrT^'"' “ ’'"°™ we assume that the 
lous outcomes does not change our predictions The 
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bas.c probab.l.t..s here are all determ.ned by the rn.t.al d.stnbubon p 
defined by . , i 

and the transition probabilities 

p„(«) = Pr[A'„ = a,l^.-i = “<l 

As ,n an .ndependent proeess, an 

u^d^de^" tre" hTthe Marhov proeess has 

transition probabilities consider the influence of 

In defining a stationary proeess '«"“’Xthe process ,s such that if 
the past Instead we ‘*’‘1 “ observations, our predictions 

we arrive at any tune and ® j ,be time that we happen to 

about these observations are independent ol 

arrive , -jed to know all the basic 

To specify a stationary process we in eB , eposes a condition 

probabilities The condition of n oHndependence and the 

on these probabilities, but unlike e information necessary to 

Markov Lndition it does not decrease the 

prescribe the process completely permitted to have 

The martingale process is again now dealing with a 

dependence on the complete pa condition on the expected va u 
rea'l valued process which ^ ^ " 

the next outcome given all o P 

A(;r. = n.)l 

E[X„ 1 (A„_i = n„-i) ^ (jp _, = a„-ii) A 

= 2nimdr„ = ni|(^”-‘-"”- ,,"w,ng interpretation If»e 

Themartmgalecon^^^ 

think of A'l, A'g process IS a martingale i^~vc lust what 

of plays, then the '^‘"/fPrtune aflcr the P’^bes^ot so much m 
the sense that h'* “P interest m rna-lmS 1 ^^,ii„I 

It was on the last p y ^ ^p^cific different processes 

their use as a mode ^ number realize that for 

mathematical resu 
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and Markov processes the assumptions have also the advantage that they 
cut down very much the amount of information that the scientist has to 
supply m order to determine the process completely It is often useful 
to use a specific type of process as an approximation to reality even 
though one knows that the assumptions are not in fact completely 
realized An interesting example of this is Shannon’s famous approxi- 
mations to speech Shannon (1948) first considered language as an 
independent process with a common distribution The elements of the 
space are (aj, Uj, a„\ and the outcomes arc the letters of the alphabet 
and a space mark, the common distribution is the relative frec|uencies of 
these outcomes in the English language He then obtained as a realization 
of this process the sentence, 

OCRO HLI RGWR NMIELWIS EU LL NBNESEBYA TH EEI 
ALHENHTTPA OOBTTVA NAH BRL 


He next assumed as a model a Markov process with stationary transition 
probabilities p^^ obtained as the relative frequency with which letter a, 
follows the letter a, in an English sentence He obtained 

ACHIN ^ INCTORE ST BE S DEAMY 

ASSV^°OBE°SE^TE CTrE^'"" TEASONARE FUSO T.Z.N 

"“S ons >n which the past through 
Ibtamer ‘his case he 


PONnFNfS CRATICT FROURE BIRS CROCIO 

Joulcfme ofl' ? ■' hut by considering 

conXe” A w "t >=‘t=t= d may be so 

approximations to Engirh "’aII thtTo^^ 

Stationary processes Murh r considered are in fact also 

theract7ha?treTo«^^^^^^^^^ theory is based on 

no assumption is made concermno ^ ^ ^ stationary process, 

In this example, we see dramatt-^ii'^ 

weakening the assumntioTi« «, h improvement made by 

when we do so n e difficulty in determining the process 

m a simdar continuous time processes are defined 

interval and the 

Definition 6 A process (Y te=t\ 

’/ for any t, <t < * process of independent increments 

(X -Y ^ variables {X.-X,\ 

'» - ^ 1 , ,) are independent * * 
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^ ^ lY IS a Markov process if for any 

Definition 7 A process * 

a„ a, a„ and h < h < ■ ■< 

i/v rt \ K (y = a„ ^ r\ r^{Xt 

Pr[jr,, = a„l(X,,., = a„-.)A(J^.... 1 v - o ,1 

= Pr[X,, = a„ \ X,,., - 

„ , lY ipi\ IS a stationary process f for any 

Definition 8 A process 

P.KA',. = aOA(^,. = n^A.. A (Jr, .-«.)] ^ 

= Pr[(A'..„ = ‘'i)A(X.„. = nJA ..A(X,.„ JJ 

Definition, ^ procerr {X., ’ 

o„_i and ti < t2 < • • ^ w „ 

P[^.. 1 (^1.-1 = ^ ' ly ms 

ae^r: 

ent random vanables The anal g independent mcre- 

interesting for continuous time wne depends only on the 

ment process and the distribution ^jent increment process 

difference , - r, we say that we ,„alogons to sums of 

with slationary a common distribution 

independent random variables with a com 


1.3 Some ProbabiUty Concepts 

we summanae here a few 

measure assigned to even'®- assigned a me 

events of the form {“ the monotone function 

cumulative distnbution ol A 

FW^PriX^^i . „hich IS continuous 
r nction IS an increasing ’ jf the cumulative 

The graph of this -=‘-™„,„hle number of jumps, 
except for a finite ,1,^ form 

distnbution F can b 

a;)=J 

T„saytha./.s the for 3r 

for some continuous function/, we say 
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If the possible values of X form a finite or denumerable set {<7i, a^, . , . }, 
the cumulative distribution function is determined by the probabilities 

Pi-^Pr\X=at] 

In this case we often speak of {p^} as the distribution of X 
A particularly important distribution, which has a density, is the normal 
distribution defined by 


J-<a 


The density function is then 


- 1 .. 


An important special discrete distribution is the Poisson distribution 
defined by 

X‘e-^ 

P, = -— 

for y = 0, 1, 2, ■' 

nrobrbuI'tl''r'’^r'''.'i!'“ ^ ^ 0" a Single value a with 

Lntotid at'.t, of has all of its mass con- 

“orh^wtecut^^^ 

distribution function by fte'sueltiIsTmegral“‘'"’ “ 

A<1F(A) 

In case a density function exists, it is equivalent to 

FW = J */(a:)di, 

and m the case of a discrete distribution to 

Fm = 2a,p, 

Jv:ir“"“ ^ “ O-oted by Var [X], and it has 

Var[2r] = £[(jr-£[A'])i.] 

by Cov K Tl rgfven end T.s denoted 


‘^0'’[-T, n = £((Jr-£[A-])(y_ 


■ F[P])] 
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F,g I Example of d».r.but.om 


w. .ui b, 

Tww X .x! • • • •" "’ 

ne sequence “f x^'cnm'erges IQ the Jisinbumn „ o„„dcr 

Whcn«es<“‘'> *“™^„ This theorem ''”'“pt,lv normslired. 

the central limit ^ d„t random if^j^amiha' esarp’e 

conditions 7„ ,he normal dis.nbotion The 

convtrgc m disin 
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said to be a simple contingent schedule if these probabilities depend only 
on the response of the subject on the {n + 1)^/ trial It is a double 
contingent schedule if it depends only on the last two responses It is a 
schedule with past dependence of length rn if it depends only on the 
^ + l)5t response and the preiious m responses and reinforcements 
o ^ecify the probabilities (in) m the noncontingent case we need give 
probabilities tti, TTg, where Try represents the probability 
that the experimenter chooses theyth reinforcing event In the simple 
contingent case it is necessary to specify a matrix H with entries that 
represent the probability that the reinforcing event j is chosen when the 
last response of the subject was i 

the process just 

on th tC* ^ probabilities for the various responses 

on the «th trial That is, we dehne ^ 

PAa,e, ,e,a,] 

Then we let P„ =b (/> 1 p a p ^ 'ru.. 

'h c^nlmgenrca^'e 

following To show touhe'po “ss V f '''!ra M "’k' '* 

IS contained in thelnowledS of / '’’V 

menter’s choL /" Ttw ^ experi- 

turn depends onlv”on ' “"“"S'"' ''Oso the choice of £ m 

m the know7d ' is contained 

occurrence of all alternM,‘vcs A and fr'e !'*? for the 

bilmes cannot change this givenmformatmn'’ 

two al'Zmlvel o7of Xfifreinfo^d " ““ 

only consider the one di“i;“£[7^’' ‘7| 
pro^Unef a Mar^l^^; = esl tra^sToi 


(1 - 9.)/> + OiAn 



{l-p>7r„+p„j 


C - flslF + 9,A,. 
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This IS a Markov process which moves on the unit interval, where on each 
step It moves to one of two possible positions i.arnine 

We consider next one of the stimulus-sampling mode s to Earning 

namely, the pattern model described by Estes in Copter 
Estes fl’959) V assume lhat there are only 

each tnal one of these is reinforced In th p AT stimulus 

that the subject makes his decision after choosing one 

patterns 4 assume that these chmces -^qually 

probability of choosing any one pattern > / response to 

connected to one of the Experimenter reinforces the 

which the sampled pattern is connected It the expc -onnection of 

alternative chosen by the subject, there is no ‘=ha"g^;;’; ““ability e 
the patterns If he reinforces the other a *atna , 
that the pattern sampled changes its connection to agree witn 

was reinforced . . we assume that 

To illustrate the determination of the basic p ^ 2 and 

there are only two patterns We " the -P for the 

Similarly for the connections of the pat 
process now is ^ ^ 

oi s {uiV-iSiaieiUiV^tazet ) 

..«« rtf the first pattern on tne 
where «„ is 1 or 2 according to the of the second pattern, 

nthtrial, e„ is 1 or 2 according to the con n ^ 2 according 

s„ is 1 or 2 according to which pattern according to which response 

to which response was made , and is o 
was reinforced . 

We indicate the outcome functions by 

I/., n, d., y.. n. d. 

where .s the connection of the fi.t pa.lern on .he 

the connection of the second P*'*®*^*V * soecify the initial probabilities 

To determine our process now we must spec y 

/>r({y. = "1* 
and the conditional probabilities^ ^ 

(0 = „ . e.o.J.i.u.l. 

(1.) PrlK.,, = ■ -. .a. . . . c.o.br.u,l 

(,„) MS.*. 

(IV) Md...^"- ,e.n.s.r.u. .-e.o.r.r .0.1 

(V) MS... = ,hat are r«-W- 

Wc need onl) consider pa 
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the assumptions of the model determine the conditional probabilities 
Those in (i) are determined as follows If = 1 and ^ w„+i, the 
probability (i) is c If = 1 and m„ = it is I — c In other cases it 
IS 0 Similarly, for (ii), if = 2 and ^ the probability (ii) is c 
If ^„ = 2 and it is I — c The probabilities in (in) are always 

i since we have assumed two patterns. The probability in (iv) is 1 if 
^n+i agrees with the connection of the pattern last sampled and 0 otherwise 
The probabilities (v) are again at the disposal of the experimenter Since 
the pattern process is presumably unobservable, it would be natural to 
assume that these probabilities can only depend on the value of e„ and 
We have then the same special cases of noncontingent, simple con- 
tingent, etc , reinforcement ^ 

Just ss m the linear model, Markov theory can be used to study this 
process We introduce as before the process P„ by defining 

Priw. eiOAUiir,) = Pr[A, = 1 | 

inX that the value of />„ depends only 

7n XT “"‘‘"'ot'd to response I, aSd m fact it is 

tm^em or ^ tesponse In the noncon- 

Sess wre?' ““ th' Ptocess P,. />„ .s a Markov 

LtestO l/JV ° Markov chain with 

?hesK’,/ » T ?tweeantakethestatestobe0,l,2. ,iV 

methods We return" “his problem'laler 


2 INDEPENDENT PROCESSES 


questionrabout such rproKssrette t^h '"totestmg 

first n trials Let 5 9 ^ u *u averages of the 

by 5„ s= Xj 4- A' + 4 - y ® 5®^uence of random variables defined 

relate to the distribution of S for 

Ians After that we considpr"th^.^ ® c ^ theorems are called limit 
of outcomes S,(to), ^■^(cy) the actual sequence 


2 1 Limit Theorems 


We assume now that X v 

^r. = A'l + A'j + ^ ^ independent process 

has a finUe^exp^tS va"ufgiven“bv™'" 
£[s.l = £[A'.] + £[jr,i+ +£[XJ 


•E[S,1 = ilA-,] + + 
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We denote E[S„] by Then ^„ = o, + ^2 + + “" f 

each X, has a finite variance Var [X,\ = then the sum S„ 
finite variance which we denote by Although it is no rue i g 
that the variance of the sum of random variables is t e ° Hence 
vidual variances, this is the case for independent random variable 

= V + V + + >>- 

Definition 14 The sequence X^, X,. '‘‘fZaZbts' 

IS sa,d to obey the wLk law of large numbers tf the random variables 

S^-A^ 


exist for the w=a^;;^'-f, 
numbers to hold One simple and useful sufficient condition is tha 
BJn^O In particular, this is true if the is 

uniformly bounded To say that a ° of the random 

uniformly bounded is to say that all the v number K 

variables can assume are less ^„„„,nn distribution, then 

thf ^eL' “w o™ umts hums - -^n t‘1:: 

mean value exists for X^ If a denotes i ^ ^ converges to a distribu- 
theorem states that the distribution ol ( ; ggg sJn converge m 

tion concentrated at 0 In other wor s, ^ likely that these 

probability to a This means that or a enables us to use the 

averages will be near a Of course, I * g^^s as an estimate for the 

average of a large number of „ ,5 unknown In particular, 

mean value of each experiment wrhent ^ ^^,,,1, proba- 

assume that we have an “P'""’'"* ‘ „ J , _ „ We repeat this cxperi- 
bihty p and a failure with , „hen theyth expenment is a 

ment a sequence of limes Then if ^f independent random 

success and 0 othci^vise, wc jjjjniean/^ The sum 5, » i 

I?3r=i2iS=3=& 

wc somejuslificalion for this 

of successes the law of large 
inlerp relation 
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Decent rn T'^ ">0 sccond important theorem about sums of indc- 

Defnir namely, the central limit theorem 

isWmnh . of mdependen, random tar, Mes 

« satd ,0 obey the central limit theorem tfthc dtslnbaiton of 

S* = — ■^n 


converges to the normal dtstnbuuon 

withmMro,an7theeSo"rthenfd'* “ random variable 

B. IS to Eive a random Ut this by the standard deviation 

random viable V. 1“" '' *• Thus, for each n. the 

distribution, and the central t mo Th^" "ic unit normal 

approximatelvthesam#*d.c» u ^t^tes that for large n it has 

and sufficient condition f “"“"os this normal distribution Necessary 

they arfSy com "caL 

individual summands X X sufficient condition is that the 

variance B « should tnnH’io'’ c . “"''“tmly bounded and that the 

ditions is to assure that the contnh ^ d*" no"' 

a large influence on the total su“f°" f"® command cannot have 
necessary ^ Some such condition is 

the central hrau theorem'to hol^"' 'If"'"'’'®* “ sullicienl condition for 
have a finite varta„ee A ' " ‘u“' *“mmands A-., 

common variance bv The denote this 

distribution of central limit theorem then states that the 

— nq 

"ThTfentauLT™"' 

generality The summands mieht^r'i^'r realizes its 

observations and their distrfbutiol? different types of 

nonetheless, the averages of sufficient! '*'"d® '■"symmetrical, 

determined by a symmetrica. M "ave a distribution 

heorem ,s often offered as th^ ex^T ’P'’® Senerahty of the 

the normal distribution For cramT f " '''® '■®®‘)“®"‘ occurrence of 
■"dividual ,s the cumulative efS3 1 ‘ " ^e height of an 

to obtain limiting d,stnta.on m^‘” f"“ ‘'i®'’®®”- ■' 'S quite possible 

y adding up independent quantities A P"® distribution 

next a situation that leads to the Poisso^ df f,nbu“P'® w® eonsider 
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Assume an experiment is repeated n times and that on each trial the 
occurrence or nonoccurrence of some event is recorded ^ ^ y 
probability that the event occurs and 1 - that it does not Let Xi, 2 , 
, A'„ be the outcomes which are either 1 if the event occurs and 0 

otherwise Then S„ = A-. + A', + + gives the number of times 

the event occurs in the ii trials If we let n increase an c ange p in 
way that the mean of S„. namely, np. tends to a limit A. the 
S. converges to the Poisson distribution with mean X One ""gh* JS 
that in thfs situation we have changed the distribution 
we increase ii However, this is also true m the ™' jn- variables 

after normalizing, the «th sum S„ is the sum ° ^ 

Y Y Y Y where X. = (X, - Similarly, in tne 

•^in> X^f^, , X^J^, wne in V i aU/Mit the distribution 

weak law of large number we are really talki g »nHent random 

of the average l„/«. and this .s the sum of_the n independent random 

variables aVin, A' 2 „, A" 3 „, A'nn where ^mhlem What 

These results suggest the formulation of a very gene p , . 

distributions can occur as limits of sums of t ® A' is a 

Where each °r ‘h' ” ~ dihe "eonUibutio; o?”each 
random variable independent of the other ^ 

to the sum IS small for large «•' (This, of course, must be mao 

) . . I . .^iv^H and IS one of the most elegant 

This problem has been completely -omolex to go into here, but 

theories in probability The details are which can occur is huge, 

suffice It to say that the class of been called infinitely 

indeed infinite, in a nontrivial sense Hi^crnotion of the members of 
divisible distributions, and an ana y ic distributions occur as limit 
this class with conditions under w ic Gnedenko and Kolmogorov 

distributions may be found m t e oo 

A., A. be a sequence ""“'[^quence of conslanls A. and 
Under what conditions enables 

so that the distributions of the ran 

.. . I..."™ 

variance is the norma 
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IS the central limit theorem However, as we shall sec, other limiting 
distributions do enter into the study of quite simple processes As pointed 
out by M.ller and Chomskym Chapter 13 (pp 456-457) of this Handbook, 
distributions with infinite mean and variance occur also naturally in 
applications in the social sciences 


2 2 Properties of Sample Sequences 

sum ^ r ''T ‘1““'°'’ Of d'stribution of the 

now w, b a ^ "‘■"’'’er of trials We 

prceTs ^"“re history of the 

'\v " ^‘^9“'''"^'^ “/ Independent random 

random r Ja6/e7 ‘"S' holds ,f the 

S„-~A„ 

n 

converge with probability I to 0 

the^iXXrsumrn?' f “''‘"S' "“mbers to hold is that 
2 W hou d beTnu, V and the sum 

Z Wk Should be finite This is true, m particular, when the individual 

summands are uniformly bounded 

aver^gefSave'^^tTh *''' 

necessary only to assume that thisraTvriuezI^/exists " 

CuLTcl :rof XbiCr^ belpLTi'*:rdUanding the 
which a certain eVnt rcum nr Tk ? “ experiment in 

ment many times and record I ifth^ ” ability /> Let us repeat this experi- 

Then we have a sequrnc/of 

variables, and the strong law of T ^ distributed independent random 
times that the event occurs annrn ‘’u®' "“'"bers states that the fraction of 
For example, in a secSnceXTset “r Ptobabilityy, with probability 1 

' “'“age number of heads tends to i°"of ‘bat with probability 

Stop the com from comine un ^ ^ course, there is nothing to 

probability f,s necessary®™ L X X ‘b‘^ why the 

* ‘bcorem states that onr metho^nf' ‘heorem The point is 
of the space of all sequences of nnc w measures to subsets 

measure 1 to the set of all seauencpc^' n ““‘'nnies is such as to assign 
sequences which converge to i Note that if 
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our model had assigned probability § to heads (that i^. es 

the probability measure would assign measure If teauences 

that converge to i and so measure 0 to (among others) the set of sequences 

that converge to i ^ P 2 we have 

Let us return now to our penny matching game /■ igngth 

shown three different histones that resulted from t S 
100 plays each We note that the graphs do jn fact, m 

In each case one player is ahead the vast ° ,s,n„ ,f we were 

one case one player is always ahead This is a playeTwoufd be ahead in 
to ask for the most likely fraction of time that P' J ,5 completely 

such a game, we would be apt to guess i In , / all the time, 
wrong The most likely thing is for one player to be atieao 





Rg 2 Three hiitonea of ■WP'*’ 





434 


STOCHASTIC PROCESSES 



Fig 3 Limiting distribution of the fraction of time that certain random variables are 
positive 


and the least likely thing is for a player to be ahead half the time. This is 
made more precise in the following limit theorem proved by E S 
Andersen (1953) 

Theorem I (Arc Sine Uw) Let X„ X^, be identicalfy distributed, 
independent random variables Let S„= X,+ + X, and let N„ be 

the number of times S, > Oforj = 1, 2, 3, , n Then ifPr[S„ > 0] = 

a„-r- a for n no with 0 < a < 1, 

^ ^ j -» *r"' sin iTO J - »)-« dx 


In particular, if the random variables X, have a symmetric distribution, 
that is. If Pr{X, <. -A] = Pr{X, > X] then > 0] ^ J Then the 
distribution of the fraction of time that S„ S^, , S„ are positive tends 

to a limiting distribution with density /(or) = l/nV:r(i _ „„ the inter- 

val [0, 1] The limiting distribution in this case is Fix) = (2/n) arcsin -Jx 
A graph of this density and distribution function is given in Fig 3 
tte’Vr P'™y-n>atching game Unlike 

annroarr.l T -t ■ncreases as we 

r ht ,tT “ '• ■' has a minimum at i 

com tos Jl r f “ *““ation of complete randomness, such as 

dudi^rfa'cv f This warns us against con- 

dudmg a lack of randomness from the observation of such apparent 


forTr“n?iv ‘hat it might take a long time 

ask if lUs at Last equalize their fortunes, it is natural to 

LLl^difaconV '^'"‘“allybe equalized The answer 
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Theorem 2 (Recurrence Theorem) Let X„ X,, be a 

rfut'Sagem to our penny-ma.chmg 
theorem assures us that at some time ^ th,t 

to the starting point 0 In fact, smc Continuing this way we 

with probability 1 it „,,, return aninfinite number of times 

can assert with probability 1 that between these returns 

It IS therefore possible ‘^Vr^ 

Let r. be the time required for the first retu^ .^ 2 ^^ 
time required before the second^ ^ 

number of plays between the (n ) jistnbnted, independent 

form a new sequence of integer- common distribntion has an 

random variables However, in th - 5 , ^ndard bmit theorems, 

infinite mean Hence we “P‘ ^ the^ central limit theorem, to hold, 

such as the law of large numb random variables S„ - 

It IS possible to obtain a limit „ormal 

,, + ,,+ . -b r.. but information about the random 

Results concerning the sums S„ also g 
variables N„ which give the m the first n plays This is 

that IS, the number of equaliMtion ^ discussion of these limit 

true because Pr[Nn ^ ^ * 

laws see Feller (1957) i,„nw about the sample paths for our 

Let us summarise what we now kn^abouU 

penny-matching experiment ' However, we know also that S„ 

SJn converges to 0 with the question of how large 

IS 0 infinitely often Our last cf o's This is formutated 

we can expect S„ to grow between of positive numbers How 

as follows Let {«.} be an ^ .nflmtely often with proba- 

fast can these increase and still permit , > ^ straigh 

bihty 1 ■> Let us first try <i„ = «« that for e > 0 and almost 

line graph The law of large “ n That is, S» < 

every a, S„(»)/n < ' “'hUine graph increases too fast T^he 

sufficiently large n Thus the liogcrithm gives us a sequence 

followingdefinitionofthelawoftheimrm S ^ sequence that 

W r^sS m-e-;ro = one that increases slower .n ^ 

Eon ,7 
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on ly finitely ma ny of the sums satisfy the mcquaht} + 

hj 2 B„\o^\o^ B^for every > 1, but for eiery A < 1 (/h5 mequalUy 
holds for infinitely many S„ 

As in the cenlrai limit theorem and the weak law of large numbers, very 
general conditions are known for which the law of the iterated logarithm 
holds As m these two theorems a sulTicient condition is again that the 
individual summands should be uniformly bounded and that the variance 
of the sum should tend to infinity For a more complete discussion of 
these conditions and proofs see Feller (1957) 


2 3 Examples and Special Problems (Examples 
of Independent Increments Processes) 

As we have mentioned previously, the independent increments process 
IS the continuous time analog of sums of independent random variables 
We shall here discuss only two special processes of independent increments, 
the Brownian motion and the Poisson processes They have played a 
central role in the development of stochastic process theory, and they 
serve as good examples of the way m which important processes arise 
from specific experimental situations 
The Brownian motion process was developed as a model to describe the 
motion of a particle of microscopic size m a fluid, say a colloidal particle 
The motion is assumed to be caused by impacts on it by the other mole- 
cules 

Let Xi be the X coordinate of a particle at time t The assumption that 
the medium is m microscopic equilibrium suggests that the distribution of 
Xf — A', for r > s should be symmetric and depend only on r — j Also, as 
a first approximation, it should be independent of previous displacements 
That is, this should be an independent increments process with stationary 
increments Because the displacement Xf — AT, is the cumulative effect of 
a large number of small random impacts, the central limit theorem suggests 
that Xt — X, should have a normal distribution From these assumptions 
the process Xt can be characterized as an independent increments process 
with stationary increments Xf — X, having a normal distribution with 
mean 0 and variance proportional to the time difference t — s 
As was pointed out earlier, a typical sample path for a continuous time 
process is a function, and we can think of the history as described by the 
graph of the function The basic probability measure assigns measures to 
certain subsets of functions Thus we can speak of the probability that an 
outcome function is one of a certain class of functions For example, we 
can ask for the probability that the outcome function is continuous For 
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the case °f fthe outcome 

a rVt 

Somewhat less satisfactory is the fact that p ^terval Doob 

paths turn out to have ."finhe length m any ,, „.„ed 

(1960) has pointed out that , ,hs should have finite 

away by adding a hypothesis , n,ade determine the process 

length The fact is that the hypotheses alrea y Doob suggested 

and determine the fact that the paths have 3,t„„g down 

that examples like this show that some ,„.onsistent 

hypotheses about a process » f at Z model leads to con- 
hypotheses He also suggested th -.molv taking the model too 

elusions repugnant to the scientist e model describes the be- 

seriously The fact is that the f ““""ery well 

havior of the particle in the sma an h independent incre- 

The Poisson process is again one « ' and the distribution of 
ments The increments are f ‘“f; ponional to the time 

JT, - jr, is a Poisson distribution lions of this process 

difference t — J, that is, c\l si lumps of unit magnitude 

are monotone increasing, but f f -f length / is cl The Poisson 

The expected number of jumps in an occurring randomly m time 

procesLs often used as a "’“'I' f f “fmaTes' The rate at whmh 

An event occurs when the sampl number of events that 

the events occur is the constant e Then Jf, 

have occurred prior to time t random occurrence of events 

Assume now that we want a mo emitting particles in such a 

in time For example, a -dmac ive sou™ is p 

way that the time of emission of a ^ ' , We take Xo = 0 

A-, the number of events that have "^“" /^ry independent increments 

Then, If we assume that X.is a occur in any finite time 

and in addition that only a finite ■>“ ^ very s.mple hypotheses 

interval, we are led to the Poisson process Ag 

lead to a quite specific process example, in the study of queues 

The PoLon process has been u^d for examp^^ 

or waiting lines The assumption „ nP the Poisson model is 

according to a Poisson P™"” nee of the yth and j + ' ''' 

that the times between the d, 5, nbution Iff.. • 

independent random ''f f ^„„on is given by 
arc these times, then the oisiri ^ ^ ^ 

Prir. ^ ' ^;„„,nence of this is that if ve 

that IS, an exponential distnbution Aeon 
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are using this model for arrival of customers in a line then, given that no 
customer has arrived by time Sy the probability that a customer has 
arrived by time 5 + ns independent of s Again, if this consequence seems 
unreasonable, this means that the Poisson process is not appropriate or 
that we are being too fussy about the applicability of our model 
RENEWAL THEORY A renewal process is a special case of a sequence 
of independent random variables We assume that Xi, X 2 , are non- 
negative, integral valued independent random variables We interpret 
Xi as the lifetime of the ith article which at the end of its lifetime is 
replaced A simple example is the replacement of light bulbs We cm 
also consider a situation in which a subject repeats a task a number of 
times and let AT, be the length of time required to complete the task on the 
ith opportunity 

We write as usual, S'n = X, + Xj + + X„ Then S„ is the time 

of the nth renewal We write N, for the largest value of ft for which 
S„ < r That IS, it IS the number of renewals that have occurred by time t 
These two quantities are connected by the fact that 

It is intuitively clear that if m ~ flATil, then in time t we should expect 
about tjm renewals The following theorem summarizes this fact 
Theorem 3 If Nt is the number of renewals in tme r, then 

E[N,] ^ 1 ^ 
t m ’ 

nhere m = £[-^ 1 ] Ifmis infinite, the limit is 0 If Xi has finite variance 
then 



The limit theorem is obtained by applying the central limit theorem to 
If the variance is infinite, limit theorems for Nf can also be obtained, 
but these will involve nonnormal distributions For a more complete 
discussion of this theorem, see Smith (1958) 

The results may be applied to recurrent Markov chains as follows 
Start a recurrent chain in state 0 and let JT, be the length of time between 
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tte 0 - l)st and dh return to 0 Then X,. JT,, . js a renewal process 

In a finite chain the means and vanances are finite They may be 

in a denumerable chain, as in returns to the origin in our penny-matching 

ITexample from psychology In our 
for sums of independent random variables we stressed that I™' 
are obtained whenitis assumed that the contribution of any oneumma^^^^ 
IS small compared with the total sum It is interesting that m he Bush 
Mosteller learning model one meets a situation in which this is not 
case, and as a result we have a quite different behavior , 

Specifically, consider the Bush-Mosteller -node for he n™™n«ngen 
ease Then we are led to a Markov process with state space 
interval and transition probabilities given by 



I, ™ "S 

process is started at 0, the distribution o p 
same as the distnbution of the sum 

where the e, are a sequence of O°witrpro*brbility 1 - P- 

T '’"nUheVarkov process started in state 0, 

MX, ^ ^1 = ^ 

The sums S„ converge to a finite total sum 

i-t 

^ n ^ Pris < Therefore 
Hence hm PrI5„ ^ 

" 1 <; has the uniform distribution That I5_, 

In the case 0 = i and n ^ < 6 < 1 Ho^^cvc^. m the case 0 ** s 
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the distribution is a famous pathological distribution m mathematics 
called the Cantor function It has all its mass on a set of points that has 
length 0 according to the usual measure of length (Lebesgue measure) It is 
still an interesting outstanding problem in mathematics to describe the 
type of limiting distribution that is obtained for other choices of d and p 
The fact that distributions which mathematicians consider as patho- 
logical occur in such a simple model in psychology indicates the danger of 
trying to assume that everything will be simple in applications 


3 MARKOV PROCESSES 


We begin our treatment of Markov processes by considering a discrete- 
time, finite-state Markov chain with stationary transition probabilities 
We denote the states by integers 1,2, , r Recall that such a process is 

completely determined by specifying how it starts, that is, by the initial 
probabilities = i], and by the transition probabilities which 
represent 

It IS customary to exhibit the transition probabilities in the form of a 
matrix P =* These probabilities can also be exhibited schematically 
For example, consider the simple random walk that moves through the 
integers I, 2, 3, 4, 5 We assume that when it is in state 2, 3, 4, it moves 
with equal probability one step to the right or left When in 1 or 5 it 
Tcmams in this state Then these probabilities may be indicated by a 
diagram as follows 



i \ 


1 2 3 

1 ri 0 0 

2 i 0 i 
P = 3 0 i 0 

4 0 0 i 

5 0 0 0 


4 5 
0 0 " 
0 0 
k 0 
0 1 
0 L 


or as a matrix 
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Matnx theory plays an important role m the "n^^ 

For example, the nth power °f .^41 prlb,n^ 

n steps 


3 1 Classification of States for Finite Chains 


The study of finite chains is greatly s^phfied 

several different types and ®‘“‘*y'"® ‘ amwcrs^ questions about the 
about the behavior of these special chains answers q 

most general chain transient if when the process is started 

Definition 18 state i >s called transient lywnc p ^ 

m 1 . the probability of ever returning to this state 
recurrent if the probability , -fa]] recurrent states by R, 

We denote the set of all o ^ p U T In a finite chain 

and the set of all transient states by T 1 ,„„s,ent 

It IS possible to have all states ’^j,„|jed as follows We say that 
The set R of recurrent states is „ both from , to; and from 

states I and; communicate if it P equivalence relation and as 

; to I The relation of commumcatio by 

such partitions the set R equivalence class F,. it remains m 

Ri, J!„ , R, If 'he process s'""® " ^ consider the times n 

this class for all time Choose a s possible to return to state i 

such that p',"’ > 0 These are the of these times, then 

It can be shown that if d is the of possible return times 

all sufficiently large multiples of rf m the same 

The number d is called the /yeno speak of the class Ri 

equivalence class Ri have the sa further into d mutua y 

as having period d It is possi cyclic subsets If the process i 

disjoint subsets C,, C„ . C., from there only to a state 

started m C, it can move C„ it can go only to C, That is, 

'll movrs'cVrally « <f sul^lasses „f 

ff "rst^ the process m '■"^’“‘t remains in this cla«. ^ 
the recurrent classes, and fro"; '!■ the transient ° ^ 

cyclically through the „ reaches a recurrent 

Sut is, the b" ,„r gets into) a recurrent se 

then consider the ben 
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The two theories can then be put together to give the most general behavior 
To carry out this program, it is convenient to define a special type of chain 
called an absorbing chain 

Definition 19 A siaie 15 absorbing 1 / U is impossible to leave the state 
That IS, i IS absorbing if pa =1 A chain is absorbing if it is possible to 
reach an absorbing state from every state 


3 2 Transient Behavior for Finite Chains 

We want here to study the behavior of a finite chain as it moves among 
the transient states Once it reaches a recurrent state it can never go back 
to a transient state Thus we are interested in the behavior of the chain 
up to the first time that it hits a recurrent state T o study this it is convenient 
to form a new chain by making all the recurrent states into absorbing 
states thereby obtaining an absorbing chain We assume that we have 
made this modification in our chain, and we renumber the states so that 
the absorbing states come first We can then put our transition matrix in 
the form 

J 0 
R Q 

The entries of the matrix jR give the probabilities of moving from the 
transient states into the absorbing states in one step The entries of Q 
give the probabilities of moving in one step from a transient state to the 
other transient stales The matrix I is the identity matrix of dimension 
equal to the number of absorbing states, and 0 is the matrix of all 0 s with 
the number of rows equal to the number of absorbing states and the number 
of columns equal to the number of transient states 
When the transition matrix is put m the preceding canonical form, its 
nth power has the form 



The yth entry of represents the probability that starting in the transient 
Slate 1 the process is absorbed in state j during the first n steps These 
probabilities are nonincreasing and hence tend to a limiting value which we 
denote by Then is the probability that starting in state / the process 
IS CNcntually absorbed in state j 

The //th entry of Q" gives the probability that starting in the transient 
state i the chain is m the transient state j after n steps Since the process 
iscNcntually absorbed, these probabilities must tend to 0 Thus, the matrix 


absorbing states 
transient states 
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P" approaches a limiting matrix A of the form 
A 


Th. „.n. >b.. .1 r- ■B-'" 


Q=U 


h 0 
0 ^2 


K 


where the middle matrix has only nonzero numbers > 

involves finding the eigenvalues j, 2 • qmje difficult 

such that, for some vector w uQ 

Once done, however, we can wnte 


e" = c/ 


0 

0 V 


0 


}; 


From this a simple expression 

B<"> = (/ + e + e'+ +2"''>'' 

'L:^ 0 

0 


for5*"> can be obtained since 


t V 


1 - V 
I-;* 


I 

I - KJ 


U-'R 
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When, we have a Markov chain model we are interested in studying a 
variety of descriptive quantities which relate to the process under observa- 
tion Many of the descriptive quantities for absorbing chains can be 
obtained by simple matrix operations on one matrix called the fundamental 
matrix in Kemeny and Snell (1960) This is the matrix 

This matrix is also the inverse of / — Qy that is, 

m- G) = (f- 0Ar=/ 

The entries of this matrix have a simple probabilistic interpretation 
The entry is the mean number of times that the process is ever m statey 
{counting the initial state) when it is started in state i The matrix B may 
be expressed simply in terms of N by 

B = NR 


Consider next the time i to absorption Clearly t is a random variable 
whose distribution and moments depend upon the starting state Let us 
denote by the expected value of r*' when the process is started 

in state i In particular,. is a vector whose components give the mean 
number of steps to absorption for the various starting states Then 
give the fcth moment for these times We first observe that 


g 


* Nl, 


where 1 is a column vector with all entries 1 To see this, we observe that 
is the mean number of times in state j starting in state i and hence 

a transient 

is the mean number of times m a transient state starling in i That is, 
is the sum of the components m the ith row of JV We shall give the 
method of computing in some detail since it is typical of an important 
technique in Markov chain theory We first observe that 


gl*’ = = X P»EjKt + i)‘l + X Pv, 


This expresses the mean value of t* m terms of the possible outcomes of 
the first step and thecondjtional expectations given these possible outcomes 
Using the binomial theorem, we can write this in vector form as 
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or 

Thus, since iV is the inverse of / — 2. 

This IS u recursion relation for g<« m terms of the values for smaller 
This relation has the following solution Let 

With A{r, 0) = 5^0 and A(r, 1) = 1 ^ro Then 

gOt) _ 

We thus see that alt the moments of ' [he mver^e^of / - 2 Thus 

operations on N The matnx A^itse ^ to find these from a 

It IS a simple matter, using machin ,F,,,„stnictivetousethequantity 

transition matrix for an absorbing c ai ^ previously with the 

< to compare the matnx method of ? 

method used when the eigenvalues various starting states. 

The probability that ( = ". as a funrt 
IS given by the components of the vector 

the 

That IS, for t to equal n it “on thfnMt step it moved into 

process was m some transient sta probabilities are 

some absorbing state If we can diagonalize 6. tn 


^ " fine! thf exact distnbution 

Xhatis, given 

are often°ve“’hard fnaly'ze n' th°e length of time that the 

A particularly simple q“a"<' y probability °l'«"'a'";"®, "hm,on 

chainremainsinapartieu Giventhat 
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the process has been in state; for the last m steps, the probability that it 
remains k more steps is independent of m The mean length of time that 

theprocessremamsinstateiis 1/(1 — and the variance is;jf(/(l pu) • 

It IS also a simple matter to compute the probability ha that starting in 
the transient state i the transient slate j is ever entered Then the sum 
2 hi] gives the mean number of different transient states that are ever 

reached starting m the state i Formulas for these quantities m terms of 
the fundamental matrix N may be found m Kemeny and Snell (1960, 
Chapter 3) 

It IS often convenient to modify the given chain and to study the resulting 
related chain For example, suppose we are interested in the number of 
different changes of state that take place in our process Then we simply 
form a new chain that has a transition matrix P' obtained from P by 
replacing for the transient states by 0 and renormalizing the rows so 
they sum to 1 This new chain represents the old chain watched only at 
times that it changes state The time to absorption in this chain is the same 
as the total number of changes of state in the original chain Thus its 
distribution and moments give information about the original chain 
As a second example of forming a new chain, assume that we are 
interested in observing the original chain only when it is in a subset E 
of the transient states This gives us a new absorbing chain, and its transi- 
tion matrix P® is easily obtained from the fundamental matrix N, In fact, 
if Ne is simply the matnx iV restricted to rows and columns corresponding 
to slates in E, then = f - where is the Q part of the new 
transition matnx P^ Then 



There is an important connection with classical potential theory and 
Markov chains Of course, classical potential theory seems a long way 
from psychology, but this connection has led to new results for general 
Markov chains, so we mention it briefly here 
A potential is defined by a function g over the transient states that has 
the form g = Nf, where/is a nonnegative function on the transient slates 
(Here we are representing functions as column vectors so that /< is the 
value of/ at the state i ) The support of the potential is the set on which 
the charge/has a value different from 0 We can give a game interpreta- 
tion for a potential in which a player receives an amount/, every time the 
process is m stale; Then if the process is started m the transient state /, 
the value of the potential g at the state that ts. g„ ,s the expected total 
Winnings ^ 
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If £ is a subset of transient states, then the equilibrium potential is 
defined to be the one with support in £ that 1°" 

the set £. In our game interpretation, we are asked ° 
so that the player is paid only when in £ and su h 

nings are a eonstant no matter where in £he starts. Restricting everything 
to £, we are asked to solve the equations 

1 = Ne/e 

f..f. But this is/= (I - 2^)1- 

follows. /. IS the escape probabilities. The values 

first step and never returns. Weca^ in £ 

of the corresponding potential g NJ startine at i £ is ever 

the value of g represents the probability that starting 

Tn'dassical physics the capacity of a set is the total equdibrium charge^ 
In this case, we define "amp, ^ row of JVhas this property 

row vector such that a ^ J 

Then the capacity relative to oc is defined as 

c(£) = ZV.- 

where {e,} is the charge of the '’"['ijararthrset mcrelscs, the capacity 
to prove, as in the classical ca , escape probabilities For ex- 

mcreases. This gives information about th P P „^„ee 

ample, as we increase the “ e rapacnj mc^^ 

the escape probabilities decrease, 
decrease too fast. 


3.3 Ergodic Behavior for Finite Chains 


j.j urgouih^ 

we assume now that we - — r 

rh:rnhas"p"md"d:so^haMhemamd^^^^^ 

For an absorbmg chain. , ^His fact is 

the transition following theorem. 

replaced here b ^ j; j,, ^ S_ ^ p„-. ^ ^ ^ eacA 

Theorem 4 n . tec/or cc is the 

. ;„vfc ,s the same vector a = (fliOs • • • °nt’ 
row of , ctor such that aP = «. 

unique probability veci 
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A proof of this theorem may be found in Kemeny and Snell (1960, 

Chapter 5) . 

The ijth entry of / + J" + P" + + is the mean number of 

limes the chain is m state j m the first n steps Dividing by «, we obtain 
the mean value for the fraction of time spent in state j This theorem 
states that the mean value of the fraction of time spent in statey approaches 
a limiting value Cj which is independent of the state in which the chain 
started We shall see later that a stronger result holds, namely, that the 
fraction of the time in state j approaches a, with probability 1 
The limiting matrix A is easily found since a may be determined simply 
by writing down the linear equations necessary for a to be a probability 
vector and a = ai’ This vector also has the interpretation that if it is 
the initial probability vector, then the probability of finding the chain in 
the states at any later time is also a In fact, with this choice and only 
with this choice, the resulting process is a stationary process 

In the case d^\, the basic theorem can be strengthened to state that 
A That IS, the probabilities of finding the chain in state j after 
n steps approaches a limiting value independent of the starling state i 
In a recurrent class the process simply moves around through the 
states returning to each stale infinitely often with probability 1 Hence 
we ate mtetested \n quanubcs which have to do with the behavior iti 
going from one state to another as well as in quantities having to do with 
the average lime spent in the states 

The fundamental matrix for transient chains is iV = (/ — Q)~^ — 
f + Q 4- Q® + In the recurrent case Kemeny and Snell (1960, 
Chapter 5) showed that many of the descriptive quantities can be obtained 
m terms of the matrix Z = (/ — P + ^4)"* In the regular case this 
matrix has the power series expression 

Z=I+(P~A) + iP^-A) + 

It does not have as simple an interpretation as the matrix N 
The mean first passage time ntf/ is defined to be the mean time starting 
in state i to reach state j for the first lime i = j, is the mean time 

to return to i starting in i The matrix M = is called the mean first 
passage matrix The entries niff are particularly simple, namely, l/o, 
This IS intuitively quite plausible since we expect to be in the state i a 
fraction o, of the time, so it should take, on the average, l/a< steps to 
murn The rcmavnmg entries are obtained from Z by m„ = 1 - z„) 

The variance of the time to go from / to j can also be obtained from the 
fundamental matrix A In particular, tf the process is started m state 5 ,. 
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the variance of the time required to return the 

( 2 e„ - a, - l)/u, Let us idart the proces^ ” required 

time requited to return for the first tim , 2 between the 

to return for the second time and in gener " distributed 

,r: "w“: - — s 

ftme Si'm the first „ ste"ps. then in s"cM“we 

process we can obtain results abou „ section 

shall mention mote general limit t ‘ ^ of the times in two 

Another quantity of interest is * ^ ^dlet yi be the number 

different states Let us start the process m state 1 ana 
of visits to state j before returning to state 1 

i_» = Cov [P, Y‘] 

LetS„ibethenumberofvisitstostateyinthefirs.nsteps Then.tcanbe 

shown that v 

l,ni|0Cov[S„',S.*]) = A, 

exists independent by 

Cfj can be computed from tne i 

The quantities e„ and *.'* whmrenabk onfrdo this 

set of quantities to the others The formulas wni 

, , .( a’ . “(“< c \ 


and 


This connection was 


observed by Kolmogorov (1962) 


3 4 Limit Theorems for Ergodic Chains 

lir ;t7nrfu-V:^‘trme”d‘ Vn 'thf 

ergodic chain 1-uW 
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consider the new process /(Xi),/(Xa. . That is, this new process 
records /(j) whenever the original process was m state j We are interested 
in studying the sums S^=f{X-^+f{X^+ . +/(A'«) If /is the 

function that has the value 1 on the state y and 0 on all other states, then 
/(Z.) IS 1 if the chain is in state j on the nth step and 0 otherwise Thus 
represents the total number of times that the chain is in state j in the 
first n steps It is helpful in thinking of the general case to interpret our 
process as a game in which the player receives /(j) every time the cham 
IS in state ] The expected winning then on each play when the process 
IS in equilibrium is 

«i = 2 “iSd) 

1 


The first theorem, the law of large numbers, states that the player’s 
average winning in the long run is equal to this mean value 
Theorem 5 (Law of Large Numbers for Markov Chains) Let f be any 
function on the state space of an ergodtc Markov cham {X„} Let 
=5 J(A'i) +/(X 5 ) + +/(A'„) Then the averages S„/n converge 

with probability 1 to the constant value w =: 2 Ojf{j) independent of the 
starting distribution 

For the law of large numbers, the constant m plays the role of the 
common mean when the random variables are independent For the 
central limit theorem we need a quantity that will play the role of 
the common variance per experiment in independent processes This is 
provided by the following result 

Theorem 6 Under the assumption of the previous theorem, (l/n) Var [5„] 
converges as n tends to infinity to a constant fCtifi 

Theorem 7 (Central Limit Theorem) Ifb^ > 0, Ihe distribution of 

S„ — nm 
h^/^ 

conicrges to a normal distribution 

Theorem 8 (Law of the Iterated Logarithm) Suppose b^>0 if I 
then with probability I ’ 


5'n < nm -b •J2n log log n 

only a finite number of times, and ifX^l, then nith probability 1 
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The proofs of these theorems “Tefus dXe 

Interpreting /(X„) again as the to the 

Z„ to be the total winning between the Hentically distributed and 

state 1 Then Z.Z. is a sequence between the 

independent random variables „ _ and 

nth and the (n + l)st return o st t^^^ variables are independ- 

the mean value of IS ni-n^nd 

- - K... ... 0 i, .rt.; *.«;<- '"-ySi™ 

ent random variables to these " these theorems is 

are obtained The basic quan i y y ^ i nt Details of these 

also the variance of the random variables Z„ r„ 
proofs may be found in Chung (1 ) t,n„t theorem available 

There is also a slightly different tyP« of the number 

for ergodic chains Suppose thm^J^^ Recall 

of times the process is m each ^ -phen we can 

that S„' IS the number of tunes ms ^ are s states Then it is 

form the vector S„ = iS„ , S„ , ’ " ' . ,^5 random vector 

possible to prove a central limit theorem 

s„ — na 

The limiting <i'^«ibut.on ,s Setails of this result 

determined by the matrix .mher 

see Kolmogorov (1962) ,_n.,ai theory has also suggeste a n 

As in transient behavior potentialthco^ _^^„,.on some 

of new probabilistic results for Marko ^ 

typical results . ^ the chain m state /, 

^Lt £ be any set “”procLs is m state A before returmuj 

be the mean number of times Ih P ^ [past once 

to 1 provided f t d tas en.emd^‘_^^^, 

fern w = Tin ' ^ 

J re C=c.rof’^ relanve to « for 

A verv peculiar result suggestea ; u nas 

r (‘n'ere a- Shrrow ’vector u ^;^^=;"‘“;obab,hsUc m.er- 

IS . = -- 1 

states, a new oia.. 
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theory it is possible to obtain simple matrix expressions for the transition 
matrix of this new process 

Details of this application of potential theory may be found m Kemeny 
and Snell (1961, pp 114-115) 

3 5 Applications and Examples of Finite Chains 

As an example of the application of finite Markov chains, we discuss 
briefly the pattern stimulus model defined in Sec 1 4 In the simple 
contingent case the basic outcome process is a Markov chain with station- 
ary transition matrices When there are two patterns, a state is a vector of 
the form {u v s a e), where u is the connection of the first pattern, v the 
connection of the second pattern, s the stimulus pattern sampled, a 
the response made by the subject, and e the response reinforced by 
the experimenter 

The type of chain that results depends on the reward schedule of the 
experimenter Recall that this is determined by a matrix 


1 2 



where is the probability that response j is rewarded when response i 
IS made Assume first that the reward matrix is of the form 


1 2 



That is, the experimenter is sure to reward response 1 when it is made 
When the patterns become connected to 1, their connections can never 
change and the subject will continue to make response 1 Then the states 
(till l)and(l 121 1) are recurrent states AU other slates are transient 
The process eventually reaches one of these two slates, and then moves 
between these two states To study the process m detail, we would treat 
these two states as a single absorbing state, and so have an absorbing 
chain with a smglc absorbing stale The case 

•"lil 

0 ij 


IS similar 
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If the matrix is of the form 


1 2 


1 n 0" 

"2L0 iJ. 


the experimenter always rewards the iwu*! 1 (22 1 2 2), 

this case we have four recurrent 1 smsle’ recurrent class and the 

and (2222 2) The first J,es are transient 

'1n‘:fi o^r c"settf an states forms a single recurrent Cass 

The period is one except when c 1 an 

1 2 




^ ^ the 

m which case the period is 2 For 1212) From (21121) 

process is sure to move to cither (2 1 1 2 i) o t 

It IS possible to move to (1 1 1 1 V , make the recurrent 

To study the cases where '“'“The length of time to absorp- 

states absorbing and use ^ complete learning From the 

tion can be interpreted as e response and is a way 

time he is absorbed the „ known, the moments of 1 

reinforced If the 

time can be obtained rthtnincd «e#. i 

distribution of this time may e number of times that 

Other interesting quantities a jponsc 1 is reinforced T 

IS made and the number of ,'”5 Define a function / by 

fundamenlal methods for functions of a recurre 

the covariances usi g S4-8S) , „ i.miimg 

Kcmcny and Snell ( -P^ course, interested ' 

in the tec“7'f ,he eharaeletistie function of the s tau 
probabilities 1 e / characteristic function of the 

the response is I 
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the response 1 is reinforced. Then/CXi) +fiX^ + ■ - +/(^J pves the 

number of responses of type 1 and gi^i) + g{^ z) + • ■ • + g(^n) 
number of reinforcements of response 1 in the first n trials We know 
from the law of large numbers that 

/(X-.)4-/(X^+ 4-/(X„) 

n 

converges with probability I to a limiting value a. This is the average 
number of responses of type one in the long run. Similarly, 
g(x^) + gjx,) + + g(X„) 

n 

converges ■with probabihty \ to a limiting value 6. It vs a peculiarity of 
these learning models that a — b. That is, the subject and the experi- 
menter in the long run match each other in the frequencies with which a 
response is made and reinforced We know that these quantities have 
limiting variances and covariances 

We know also that we can apply a central limit theorem to the number 
of type one responses and type one reinforcemenls Similarly, we have 
the law of the iterated logarithm available. 

Estes has proceeded somewhat differently in studying this model. 
He has focused on the connection process which, as we observed, is a 
Markov chain It is somewhat simpler than the chain that we have been 
considering, and he was able to compute the basic quantities for this 
chain The transition matrix for the connection process is obtained as 
follows. Recall that a state is now the number of stimulus patterns 
connected to response 1 If there are N stimulus patterns, there are 
N -b 1 states 0, 1, 2, . . , N Assume that the chain is in state i. Then if 
/ = 0, It either remains in this state or moves to 1. If it is a state with 

0 < 1 < N, then it can remain in this state or move to i -b 1 or i — 1. If 

It IS in state N, it either remains there or moves to — 1. This is because 
on a single trial only one pattern can be affected Assume that 0 < / < W. 
We shall illustrate the computation of To go from state i to state 

1 -- 1 the subject must sample one of the patterns connected to state 
response 1 , he must make response 1 , then the experimenter must reinforce 
response 2, and, finally, the pattern sampled must change its connection. 
The probability that a pattern connected to response I is sampled is i/N. 
The subject then makes response 1. The experimenter reinforces response 

2 with probability tr,. The connection then changes with probability c. 

Hence ^ ^ 
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In a similar manner we find the 


other transition probabilities They are 


i^OO — 

1 — CTTjl, 




= 1, 2, , N, 

Pt »-l == 

NC7Ti2 



1 IL=J\ 

_ 1 = 1,2, 


I N I 

Ncuii 

Pi i+1 ” 

H^h' 

1 = 0, 1,2, ,N-l, 

Pa v = 

■ 1 — crri2 



Note that if we think of this Such"a chain is called 

integers, it can go only to ™'S (10591 and they have developed 

a random walk by Karlin and Me reg ■ ,1,552 processes 

elaborate methods for studying them 

later. . . i,.„ psi.s has given a detailed 

However, for the chain descriptive quantities 

study and computed a large number of interesting 

See Bush and Estes (1959) . u « K..<an studied a great deal by Markov 

Another stimulus model which has ,,,,5 n,odel it is 

chain methods is the number of stimulus elements Each 

assumed that there are a "“'AAe subject samples a subset of 

element ts connected to one „ probability equal to the 

the elements He makes ^“P™^ * 55 , ihe sample chosen f no 

fraction of elements connected to resp ,, „,cessary AlUle- 

stimulus elements are sampled, „.o„ons changed, if necessary, 

rnersin the set sampled have their connections^^ 5,^^ 

agree with the choice f '’’' “^b.Iny 0 and that the samplmgs^^^^^ 
stimulus element is sampled ™''h 'Im 

independent The connection pro«s or simple ccn'.ngen 

state 1 IS again a Markov chain m the nOT e ^,,55,b,„g chain with 
c^se in this model the connection ^P55,p,„g „,ih stale 0 

smte jy absorbing if 1 always ''"“’^pscrbing with bolt, stales abm 

absorb.nB.f2.salwa>srcwarded ^Ins^^ "".“tcharn «„ be 

mg if Ihc ,jLn, cUss^ Discussions of thi 

It .5 always a 957. 1960) and m Bush an^ „„ac,|;,nc 

found in Kcmeny a Mosteller there is , ced in 

,n the linear of as generating „„c 
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With continuous state space exists which seems satisfactory for these 
applications The result is that these processes have had to be studied 
individually The most complete treatment on the asymptotic behavior 
of these processes that has been given is by Karim (1953) The behavior 
is quite similar to the connection process m the finite stimulus model 
For example, consider the special case 

< (1 -e)p+0 

dp 

If response 1 is always rewarded, then we have the case 



Th= process then goes with probability 1 to 1 as a limiting value Unlike 
the absorbing chain, it never actually reaches 1 If the response made by 
the subject IS always rewarded then the transition probabilities are 



Ihe nghbThat t'o 0 “slTTt 

very likely to move even closer ti, if it gets near 1 it is 

process can be shown to converge with Sabiu'' f 

Thus, the remaining problem is the probability tLt the ' 

each of these possible positions ^ process ends up in 

When either response may be rewarded independent of what the subject 
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J VH tn a nrocess which acts like a recurrent Markov chain 

to the interval infinitely often The probabilities 
PrlX„ < m 

converge to a limiting 

fft^nr itXr tL Cham is -ed "h is kno™ about 
these distributions Even in the contingent case, 

Sec 2, they can be quite complicated resulting experimenter’s 

Analysis of the P™““?„ethods A detailed account of the 

processes must answered for this model exists in the book 

types of questions that can be an 

by Bush and Hosteller (1955) again necessary lo use 

In the treatment given by Estes “ PP ^ quantities This has been 
special techniques to obtain Suppes m Bush and Estes 

done, for example, in the ^ 12 14 of Bush and Estes (1959) 

(1959) See also 0'4P“" ' V ’ c^ams is found in the model 

Another application °f experiments relating to 

developed by B Cohen (19 ) ’’^,,1, ^ group of pretrained con 

In these experiments a subject jet is led to believe that he is 

federates of the experimenter perception On each of a 

participating m a group “ roup is required to choose aloud 

sequence of trials, each mem ,,nes ,hat has the same eng 

that line one from among th F 

as a standard line . comes only ^ftcr c 

The subject’s turn to choow jj responses of the 

heard the unanimous, “PP°“6h jrding to his 

Presumably, he is motivated choice of .he group, 

on the one hand, and to conform to the u „„j of 

on the other hand responses each response P'‘"E 

describe the data He ass 
one of four mental slates 

CMie 1 Nonconforming 

Sw,c 2 Temporary nonconforming 

3 Temporan conforming 
Stair ^ Conforming 
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The mode^ assumes that the subject starts m state 2. The transition 
matrix is assumed to be of the form 

12 3 4 

1 ri 0 0 0' 

_2 a l-a-/5 0 

30 y 1— y— ec 
4 [o 0 0 1_ . 

This IS a four-state absorbing Markov chain The entries a, y, and e 
are parameters to be estimated from the observed data. It is assumed that 
when the subject is in state I or 2, he gives a nonconforming response, 
and when he is in state 3 or 4, he gives a conforming response b. When 
the experiment is performed, the experimenter is not able to observe the 
states of this chain, but only when the subject conforms or does not 
conform 

If we denote by A\, the outcomes of the Markov chain, then the 

response can be described as the process f{X^,f{X^ where /is a 

function on the stale space that has the value a on states 1 and 2 and the 
value b on states 3 and 4 This process is not a Markov process. For 
example, let us assume that the underlying process is started in states 2 
and 3 with equal probability. Lei us compare the two probabilities. 

0) Pr[f{X^) = b \fiX^) = 6 A fiXd = al 

and 

(u) Pr[f(X,) = b = b = 6], 

The information that/(Xi) = a tells us that X-^ = 2, and then if/C^z) = b, 
we know that X^ = 3. Hence the first probability is 1 — y. In the second 
case It IS possible that ATj = 3 or ^2 = 4. Since in the latter alternative it 
IS certain that/(A'g) = i we sec that (ii) will be a larger number. 

Since the observed process is not a Markov chain, we cannot apply 
Markov chain methods directly to it but must instead apply them to the 
underlying chain and from this obtain information about the observed 
process 

3 6 Denumerable Chains with Discrete Time 

We consider now a chain with a denumerable set of states that we label 

^ basic decomposUton into transient and recurrent states is 

the same as m the finite case, but. of course, we may have a denumerable 
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number of different reeurrent classes However, f f 

finite number d of cyclic subclasses In the finite “ 

have at least one ergodic class because the Ind 

states eventually when 

It must have somewhere to go Howeve , narticle moves 

this IS not so For example, the tnvial chain ^ ^ 
with probability 1 to the next higher integer on each trial has only 

states and no recurrent states to make the 

In studying transient behavior, it is again 
reeurrent Ltes absorbing thereby obtaining the canonical 


"=r 1 

IR gJ 


There .snogeneralmethodfordiagonabzinginfim^^^^^^ 

many of the methods developed ,his infinite senes converges 

I+Q + QZ+ still apply In '‘"f the process enters 

and the euLs again give the mean number of times P 

State j given it started in state i 

It IS still true that ^ 2)^ = /, 

so that N IS an inverse of / - G ’ and^To'” is not uniquely 

matrices which are also inverses “ reason for this is that the 

determined by the equations of an inver 
equation (/ — Q)g — 0 

may have a nontrivial solution, this “"“„”°‘„?^Peranmh=r inverse 
If It does, adding such a solution ‘o “ ^ ^ minimal nonnegative 

for / - e The matrix N is characterized = * t" „,tr.x such that 

> N,„ for all i and; ,1 „es the probabilities 

of absorption in th probabilities necessarily add up to 

It IS no longer true protabil.ty that the proKSS « ^ r,t,ve 

there may P chain in which 0 is absorbing 

An example °f ‘^,”'0 ,he ngh, with probability J and to the 

integers a step 

'7S.74.,« "«r “”5;vs 

'."hfr^lev'm moments of time to absorption are finite. 
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in the finite case apply Recall that, in the derivation of these quantities, 
we showed that the moments of the absorption time represented by the 
vector g satisfied an equation of the form 

(r-Q)s-=f 

where y is a known nonnegative function We then concluded that 
g = Nf In the infinite case, as we have pointed out previously, the 
equation (/ — Q)g=f may have solutions other than g = Nf Some 
additional argument is needed to show that g = Wf is the solution we are 
looking for Recall that a function of the form g = Nf was called a 
potential, in analogy with classical potential theory. Thus we see that we 
must show that the moments g are a potential In problems such as this 
the study of potential theory for Markov chains has greatly increased 
our knowledge of solutions of very simple probability problems for 
denumerable chains 

We have observed that a basic difference between the transient theory 
for the finite and for the infinite case is that in the former the chain has a 
cha When the Markov 

trivirone '’.I "f t a less 

thJ ktt * *l>ree-dimensional random walk This process moves on 

h i I'rp'V; form 

wi\’h nrobabiluv coordinates It moves 

with probability ; to each of the six neighboring points 

("i + 1, n„ «,), (n. - 1, n„ 

("i, "2-1, ",), (n., 71, + 1), („j, _ 1) 

a^'finiteserUcLroTr a" “ =^o"Wally leaves 

a final position, f bounlTycSthe m"''’," 

This consists of addmo .h/i , Martin boundary has been added 
that in a well defined fensc^the n ®P“oe and then proving 
at one of these new ideal points'^ ““ “ obsorbing state or 

a- T"' ■” °f >»= oquanon 
naluml way when Ve studTdef T “se in a 

A discussion of the Martin bounda^TptTbrDoob 095™"' 


3 7 Recurrent Denumerable Chains 


state 1 , the 



MARKOV PROCESSES 


461 


the mean time required for this return may >’= ^taW rnT m 

If It IS finite for one state, the same is true ^ ^ o a bLie 

addition, the mean first passage times ate also finite 

classification of recurrent chains rmtitive if the mean 

Definition 20 ^ recurrent class ts called 

time to return to a state ,sfimtefor any state m the class 
recurrent null if this mean time ts infinite random walk that 

An example of a recurrent null chain is P points 

moves on the integers, making a “ ^^^^jponding random walk 
with probability i toe dimensions the random walk 

m two dimensions Recall that m ,fi„,,eehiiin is recurrent positive 
process becomes transient ent positive is the one that 

An example of an infinite chain tha is ,_,o the left with probability 

movcsonthenonnegativeintegersmakmgastepto^ 

1 and to the right with probability i at all states except 

It returns to state 1 k the following 

The basic ergodic theorem for j- recurrent chain with a 

Theorem 9 Let P be the transition matrix oj a 

single recurrent class Then 


f 4- P + + + P- 

n +1 


n = l,2, 


converges to a limiting matrix A 7n'lhe ergodic 

null case, all entries of A are 0 IJ P 

A proof of this theorem may be “ti, Vnite ones Rccarrent 

Thm, recurrent positive chains beta ^ P™f 0 ^ 

null chains, however, are quite di number of steps tends to 

of finding the process in state , afta transient chain 

This IS like the transient ,^n,n keeps coming back It corn 

wanders off to mfimty e meu^ fnc . 

htaVto be fat away from |"7„^nt^Ti'v“^.u.ion to the equation 

components is •"b""' .nicrpretations For example. let A,j 
Its components still have "te^t^'t. „ starting at state 

mean nimtar a/ o single recurrent class 
Theorem 10 


Al: 


a, 
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as n tends to infinity, where a = (oi, a^, ) ts any nonnegative solution 

of clP = ct 

Thus, although the mean time spent m a state tends to 0, in a null 
chain, the ratio of the mean time for two different states approaches a 
limit different from zero 

It IS also true that for any recurrent chain the mean number of entries 
into state j between occurrences of state i is finite and has the form aja^, 
as in the finite case A proof of these facts as well as a proof of Theorem 
10 may be found in Chung (1960) 


3 8 Applications and Examples of Denumerable Chains 


Denumerable chains seem not yet to have found much application in 
psychology Nonetheless, it seems worthwhile to mention some rather 
general classes of processes that have found quite general application and 
about which a great deal is known The first is a class of random walk 
processes studied by Karlin and McGregor (1959) 

A random walk is a denumerable Markov chain with state space the 
integers 0, 1, 2, It has the property that on any one step xt can move 
at most one unit to the right or left To specify the process completely it is 
necessary to give three probabilities for each state n is the probability 
of moving one step to the left from slate n, r„ is the probability of remain- 
ing in state n, and is the probability of moving one step to the right 
Thus q„ + r„ + p„ = 1 We assume that *= 0 We have already 
remarked that the connection process for the stimulus pattern model is of 
this type, although with only a finite number of states 
Karlin and McGregor have introduced a powerful analytic method for 
studying these processes They define a system of polynomials g.Cx). 
I = C, 1, 2, . by setting Qfx) = I and solving recursively the equations 

=P„0„+i(x) + r„Q„(x) 4 - q„Q„_i(x) 

They show that associated with these polynomials is a unique measure ip 
defined on the interval [-1, 1) such that 


dv<x) = 0 n m 


where r, = (p,p , . . . 

the spectral rtieasure^ they show that 


In terms of this measure, called 


pi"i 
* it 






463 

ry, to obtain criteria for 
and recurrent null In 

_ Clirll J1S 


MARKOV PROCESSES 

They are able, using this representaUon jecuiiem 

the states to be transient, quantities such as 

addition, they give methods f°'' “'"P ® other quantities that we have 

the first passage distribution and any ^ it i<; necessary to know a 

discussed To find the explicit "" d is nec« ^ ^Fortunately, 
great deal about the polynomials that ® nractice are associated with 
many of the random walks that ‘ ^ ^ 3^ deal is known 


+< 

“ 21 « + “' 


II > 0 . 


-1 ™ 

random walks are the classical u p 

measure is dy- * (1 - of denumer- 

This class of chains provides examples positive, for a = 1 >I 

:errnuli:^:d“fo1^;nfi^-^^^^^^^ we mention am those 

The second class of chains, the first app .oaho^^^^^^ 

called branching processes j„ the social sciences, 3^ 

branching processes was to a p ^,1, gcncratio , 

survival of family names WeassumethoU "O"'”- “"j^./s 

time n, there are a "“-P" Pt^d'^o: ,he next g'"'” "a s-„g" 
each of these reproduces 7 °f „cles ” We o«ume that 

We adopt the "0"'''“^'™'^°, Improbability p,, 'ho' *h'* '* ^ for 

particle produces; offspring yvc then ta If at 

all particles, and that they ac 1 p particles at any go follows 

our Markov chain the total uum^r of P “ oommon 

time „ the process is ■" f ooch ofwlnoh h s the 

It IS the sum of 1 independen 0 ,5 an absmb 

distribution p = (PoiPi* > . q is reached 

interpret the process as dy"S oper.) of 

m state 1. that is, mth one p of course, an int« - 0 jinclc 

The probability of absorption produred by 

,he prLss Let the mean num^r _^P^ 'ho l^H '‘Ina. 1. «.>' 
particle be m Th<m if „ posiioe probabili > 

with probability 1 , ,a„d5 to o: 

continue indefinitclj. and il so 
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In the case m < 1, we would like to know the time to reach 0, that is, 
to be absorbed The mean time is 00 if m = 1 and finite if /« < 1 

It IS also of interest to determine the distribution of the total number 
of particles ever bom When m > 1, we can look for limit theorems for 
the total size of the population at time and there is a very simple result. 
It has been proved that the distribution of XJm^ converges to a limiting 
distribution F which has a density function / Furthermore, the process 
Xjm^ converges almost everywhere to a limiting random variable 
It IS illuminating to compare this limit theorem with the law of large 
numbers, which also leads to convergence with probability 1 but the 
limit IS a constant value The branching process is one of the few examples 
of a limit theorem in which the paths themselves converge with probability 
1, but to different values in general 

The more general model in which a number of different types of 
particles that reproduce differently has also been studied m great detail 
All these results are discussed m a review paper by Harris (1951) 


3 9 Continuous-Time Markov Chains with a 
Finite Number of States 

We turn now to a process that we observe continuously as it moves 
through a finite number of states The intuitive picture is this the process 
starts in some state, remains there a random length of time, and then 


Sn - 

T 

S3 ' 
S2 - 

Si - 


Tig 4 T ypical history of continuous-time Markov chain %vuh a finite number of states 

jumps to another state where again it remains for a random length of 
time, etc A typical history for such a process is a graph such as that 
shown in Fig 4 

For c\cry / with 0 ^ r< 00 , we now have a matrix of probabilities 
^(0 = WO), where ^ 


Pail) = Pr[X, I JTo = /] 
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These matrices satisfy the equation 

P(r + 0 = PWW 

It IS customary to assume also that 

lim P(t) = I 

That IS, in a very short time“tlere rs small probability of the process 
"'Z^classification procedure is quite 

chains. We say that state i equivalence classes as in 

Pt,(s) > 0 and if for some I, p„(0 > » ess ,s started 

iscrete-time process A state , is '[^“elass I recurrent if its 

m 1 It leaves , and stafe is recurrent if and only if 

states are recurrent Equivaienuy 

~ * ,1.-1 -■ “ 

state IS infinite A state is transient if it “"time chains, namely, 

An essential simplification occurs ^ ‘_q^e chains cannot occur in 

the cyclic behavior that can occur ,n d^i ^ ull 

continuous time chains In la » 

sufficiently large t nrocess are discrete*tinie chains 

Associated with any „ jP-j the discrete time chains with 

cMed skelelon chains Tbrs^are . y^ .u corresponds to observing 

transition matricies P(A) for h > 0 Such a cha J ^ 

the original chain only at those of the continuous 

classes of the chain Pfh) are, for a y , period one 

Cham Also the recurrent clasps of hes cha chains 

It IS often useful to approximate a conn 

by taking /i smaller and smaller determined by a 

In the discrete-time process ‘bc^Sous time case this matrix, 
single transition matnx P_ I" the cont.nu u, uenvat.ves of the 


single transition matnx P_ )''y„°s"'c"ntr.es are the denvativcs of the 

;:^0 m o'’^"^rrom'oti■ss«mpt.ons that as a func.ion 
has a derivative Then, n.ift) - i^ii 

The transition ,|,ui the backward equation, is 

equations The i • ^ ^ 
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and the second, called the forward equation, 16 

PXt)=‘P(f)Q 

The analogy with linear equations suggests that the solution should be 
Pit) = e«' 

This IS indeed so, but we need to say what means when A is a matrix 
We define this by the power series 

> 1 * 

eA^I+A + ^+^ + 

Theorem 1 1 Let P(f) be a conimuous-lune Markon chain with a finite 
number of states Then 

Um P(/) = A 

exists lfP(j) has a single recurrent class, each row of A is the probability 
vector such that a = (a^, a^. , 0 ,) This vector is the unigue probability 

vector such that = 0 

The stronger theorem is possible here because we do not have the cyclic 
case, hence, we do not ever have to take averages 
The quantities qtf have simple probabilistic interpretations Let = 
—gn Then if the process starts in state i, the length of time that n remains 
in this state has an exponential distribution with mean 1/g, That is, if T 
IS the length of time that the process remains in state i, then 

Next define a new transition matrix R = by 

“ I 

VO, ,=^j 

Then represents the probability of the process jumping to state j given 
that It IS in stale / That is, if we were to record simply the states to which 
the chain moves, v.e would obtain a history that could be described as the 
outcome of a discrete lime Markov chain with transition matrix R 
Quantities that have only to do with the number of times different states 
arc entered can be computed from discrete time theory applied to R 
In fact, any result obtained by applying discrete time theory to R has an 
interpretation m the continuous time process 
A discussion of the compulation of basic descriptive quantities in this 
Situation IS contained in Kcmcny and Snell (1960) 
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3 10 Continuous-Time Markov Chains with a Denumerable 
Number of States 

We start, as m the finite ease, with the transition matrices P{t) satisfying 

P(,s +()= P(s)P(l) 
and 


lim P(t) = / 


In this case considerably more complex 
the limit 

. = limHuW^0 


things can happen Although 




-0 / 


still exists, It is possible to have 

<-0 t 




in the finite case - was the mean length ^ 

1 when It IS started in i The ^ process instantaneously 

time the process remains in ( is , j Although this can happen, it 
moves out of this state into that this does not happen 

IS reasonable for most apphcatio a <1 <x> for all states 1 m the 

In other words, we assume that 

chain Such states are called jWWe . functions We now 

In finite chains the sample ^ example, suppose that a 

have to allow for more ,_,„er If we make the mean 

chain can only move to the next h g^ „ ^oves 

length of time that it stays in through all states 

higLr up the ladder, it is ^ „n 'he chain stays 

in a finite time Since l/y, j ./?■ < 

ttat the path reaches f , appear at later times ^nce 

If ourVoccss goes Jre^nt ways a, indica.cd in 
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Fig 5 Sample path when chain moves only to next higher integer, the mean length of 
time in state t is and S 1/^, < oo 


can jump from infinity, it is also possible to have it jump to infinity, as 
indicated in Fig 8 

The definition of reeurrent and transient states is the same as for finite, 
continuous-time ehains As in diserete-time chains, we now have a further 
classification of recurrent states into recurrent null and recurrent positive 
A state IS reeurrent positive if the mean time to return to it after leaving it 

thffoUowmr'™''“‘ ‘^‘’""“Sence theorem is 

of a continuous-time 

Oenumerable-sute chain Then i/lim P(t) = /, 

1-0 

hm P(() = A 

class IS either recurrent null or transient, A ^ ^ 

Tht?m^'rm%rfo':::^::^crg7^^^^^^ r f = “ 

uniquely the transition probabilitfes PA) ■=>'a‘ns, G determines 

chains and the reason is rouehlv the fnll ^ V 

derivative at 0 of P(i\ ,ev i fo'lowng The matrix Q, being the 

B n y we were able to construct a process such that it goes to 



f-6 6 O»=typeof„,„m,k„a.e«.„pkp„h 


goes to infinity 
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P., 7 A.«o„d.yp.or.e.u™°«h.„.h..a.p.= pa.hgo..o,nfi„..v, 

uaaU.i.tv then we should be able 
infinity in a finite time with we might assume that 

to bring It back in different o We might also try 

when at infinity it comes ^ ^ ^an be constmeted to show 

to make it come back to state 1. Examp with the same 

that this IS possible, each 'aati'ng „„ctlY the same way until infinity 
initial behavior, that is, that exac«y 

IS reached for the first time The result is tna 

processes, but Q is the same That is, if the 

^ The probabilistic that it remains there has an 

process starts in state ., «>= ''"f, "/.-‘""aIso r„ = ‘ 

exponential distribution '^,he process is started in state i th 

r - n are the otobabilities that when tne p , sums 

2;sr:u”mT.s to^tate; K . jXt ”h\frt -ml 1^ than or e^ 

equaltoLe All that can be sad.s that 

to one The row sums are less th function such as that of Fig 

probability that the Pt°“« “.o';, the system is called con.ena.n , 

When the row sums of R afe «q 

that IS, when 29« = “ “ for finite chains satisfy tivo 

we recall t{at the transition probab l t.sf 

differential equations p\i) — PinH. 
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which IS called the forward equation, and 

ni) = QP{t) 

which IS called the backward equation In the denumerable case neither 
system of equations need hold, however, it is always true that 
PXi)'^QPit) 
p\t) k pm 

The question of the validity of the differential equations can also be 
described in terms of the behavior of sample paths The ith row of the 
backward equation holds if and only if whenever the process is started 
in state 1 it has as its first discontinuity a jump to co with positive proba- 
bility Recall that r,, is the probability that, when it jumped, it jumped to 
state j Hence we could expect a jump to infinity to take place with positive 
probability if and only if < 1, that is, if and only if ^ 

which IS m turn equivalent to the system not being conservative 

Whereas the backward equation has to do with the manner in which 
the process goes to infinity, the forward equation has to do with the 
manner in which the process returns from infinity (thus showing that the 
equations have been named backwards) Specifically, the yth column of 
the forward equation holds if and only if whenever the process is condi- 
tioned to be at state j at time t there is a positive probability that the 
process arrives at j by means of a jump from infinity 
Unfortunately, even when both equations are valid, a unique process 
still is not determined That is, it is possible to have different processes 
with the same Q and satisfying both basic differential equations It is 
true, however, that given a matrix with the properties required for a Q 
matrix of a continuous-time chain, there is always at least one process 
P(0 whose transition probabilities satisfy both the backward equation 
I’XO = QP(0 and the forward equation P'{t) = P{t)Q The process is 
called the minimal process because if P is any other process with the same 
Q, then P(0 ^ P(/) for every t The minima! process has the property 
that m any finite time interval there arc at most a finite number of jumps 
It IS not necessarily an honest process m the sense that the row sums of 
P(t) equal one, they may be less than one That is, in this process there 
may be a positive probability that the process will disappear The 
behavior of any process with the same Q as the minimal process is the 
same as the minimal process up to the first time that it reaches infinity. 
For this reason, if the process has the property that with probability 1 it 
docs not reach infinity in finite time, we can be sure that it is, in fact, the 
minimal process, 

A detailed discussion of these points may be found in Chung (1960) 
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3 11 Applications and Examples of Contmuous-Time Chains 

BIRTH AND DEATH PROCESSES Confmuous time chaiHs do not Seem 
to have found application m psychology yet, however, an important 
class of continuous time chains, about which much is known, have been 
widely applied in other fields These are the birth and death processes, 
which are distinguished by the feature that they move by a jump of at 
most one unit 

A birth and death process is a denumerable state continuous time 
Markov chain with transition functions P(0 that satisfy the following 
conditions (as / -> 0) 

Pi ,+i(0 *= + 0(0, 

Pi .-i(0 = Pit + 0(0, 

Pt i(0 + 0(1), 

where Aj > 0 for i ^ 0, /i, > 0 for / ^ I, and /lo ^ 0 The notation 0(A) 
means a quantity so small that 0(A)/A approaches 0 as A approaches 0 
From our assumptions it follows imm^iaiely that the Q matrix must 
be of the form 

Hi i+i =*= ■^.. * S 0 

qt i ~ “(A; + /it), t ^ 0 

i~i=P* I > 0 

The quantities A/ are often called birth rates and death rates This 
terminology arises from applications m which the state is the population 
at time i in some sort of growth process Since the Q matnx is conservative, 
we know that the backward dinerendal equations are satisfied, that is, 

no = Qf(o 

The forward equations, 

no^p(OQ> 

need not be satisfied Examples for which this equation fails can be con* 
siructcd by making the birth rales increase so fast that there is a posui\c 
probability that the process will go lo infinity in finite time, and the 
process is then brought back lo a regular slate by some probability 
disinbulion 

The question of when the two equations arc satisfied has been complclclj 
ansNNcrcd b> Feller (1959) who ga\e simple conditions on the birth and 
death rotes for all possible situations He showed that the possibilities 
for uniqueness of solutions of these equations have lo do wuh the naturt 
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of infinity as a boundary point. Roughly speaking, he showed the follow- 
ing Infinity acts as an exit-type boundary point if starting from any state 
there is a positive probability that in finite time the process will drift off 
to infinity before hitting 0 It acts as an entrance point if starting from 
infinity it can reach 0 in a finite time with a positive probability. More 
precisely, there is a number c > 0 such that the probability of reaching 0 
before time t is greater than c for any starting state, no matter how far 
away from 0 it is Infinity is called a natural boundary point if it can act 
neither as an exit nor as an entrance point. In this case the forward and 
backward equations have a unique solution which is, of course, the minimal 
solution If infinity can act as an exit boundary but not an entrance 
boundary, there is a unique solution to the forward equation and an 
infinite number of different solutions to the backward equations If 
infinity can act as an entrance boundary point but not an exit, the forward 
equation has an infinite number of different solutions and the backward 
one a unique solution Finally, if infinity can act both as an entrance and 
an exit boundary point, both equations have an infinite number of 
different solutions 

In the first three cases, a unique process, namely, the minimal process, 
satisfies both systems of differential equations Recall that this process 
agrees with any other process up to the time that infinity is reached for 
the first time Karlin and McGregor (1957) studied in detail this minimal 
process when it is the unique solution They found simple criteria m 
terms of the birth and death rates for the process to be recurrent or 
transient In a recurrent process, they gave criteria to distinguish the 
recurrent positive and recurrent null chains They gave methods for 
computing a large number of descriptive quantities for both transient and 
recurrent chains These include the actual distribution of first passage 
tunes in the recurrent case Their method is similar to that used to study 
random walks in that they obtain an expression for for the minimal 
process which is of the form 


PiM = 7f, 

AoA. 


.A 


dy}(z), 

— , and Qf^, Qj, ... is a sequence of orthog- 


where TTo = I.tt, 

cmal polynomials with respect to the measure y. The polynomials satisfy 
the recursive equations 

S.(*) = 1, g-iW s 0, 

-»e.w = -(A„ + p,)0,(x) + + /i„e„,,(x). 
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Many important special processes correspond to well-known systems of 
classical polynomials, and when this happens a method by Karlin and 
McGregor (1957) for computing the minimal process can be used fora great 
variety of basic quantities, such as distributions of first passage times for 
recurrent chains and time to absorption for absorbing chains They have 
also studied in great detail a specific type of birth and death process 
called the linear birth and death processes These are processes in which 

hi a, A > 0 
« /m -1- />, /I > 0 

For details see Karim and McGregor (J95S) 

4 CHAINS OF INFINITE ORDER 

Consider now a stochastic process in which the outcomes are from a 
finite set A In defining a chain of infinite order, we allow the complete 
past to have an influence on the next prediction, but we try to capture the 
idea that the influence of the distant past is small The results discussed 
in this section are discussed more completely by Lamperti and Suppes 
(1959) 

We shall make two assumptions, labeled A and B, about the process 
(A'n) When these assumptions hold we say that is a chain ot 
infinite order 

Assumption A There is some stale Oq and some integer tta, and a number 

d > 0 such that 

= o„ I 1 «■]><! 

for every choice of a^a^ a„ 

In other words, no matter what the past history, the probability that we 
will be at state ag after «o steps is greater than 6 

For the second assumption, we wish to make precise the idea that the 
distant past does not have much influence on the prediction for the next 
expenment Let <^h-. ^*^4 bi A^r+i ^r+» t'''® 

histories of length r + s m which the final j outcomes are (he same 
Assume further that the outcome of assumption A occurs at least m 
times in these final s outcomes Given these two different past histones, 
consider the difference between the probabilities of outcome a, that is, 

Pr[Xf+t+i — a I tVf, 

- PrlX^^j =«!<>+* Cr+iK M 
Let e be the smallest number such that these differences arc less m 
absolute value than no matter what choice we make for a and for the 
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sequences a, . . f-i . • ■ P^o^^ed that 

the c’s contain at least m 

Assumption B The numbers «« have the property that 2 < co. 

The principal result for chains of infinite order is the following 
ergodic-type theorem 

Theorem 1 3 If Xu ..is a chain of mfinue order, then as n tends to 
mfimiy 

converges to a limiting value ir^ which is independent of Oxa^. . . 0 ^^'^ 
moreover, S “n-Q *= 1 

Note that as functions of Oia* . . iim the probabilities 
= “ 1 • • • Oil 

are random variables In some applications the moments of these random 
variables are important Lamperti and Suppes (1959) proved that these 
moments have limiting values as n tends to infinity. This paper also 
includes a proof of Theorem 13 


4 1 An Application 

Lamperti and Suppes applied the preceding results to the linear model 
described by Estes and Suppes, Chapter 8 of Bush and Estes (1959) See 
Sec. I for a definition of this model Recall that a typical history a> is 
of the form 

<o = (aiCiagej . . . ), 

where a„ \s the response of the subject on the nth trial and e„ is the 
reinforcement event of the experimenter. Denoting the outcome functions 
by Ai, Ex, Ai, £ 2 , . . . , the basic linear axiom is 

MA„^i =j I . . . eiOj] 

= (1 - e,)Pr[/(„ =j I . . . c.a,] + 

where 0 ^ 6. ^ 1 and 2 = 1. 

k 

We consider then the probabilities 

which for; fixed is a raadom vanable. Let aj „ denote the rth moment of 
this random variable, that is, 

<. = 2{f’rU„=;|e._ia_i . . . . eifli]. 
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(0 ,he remforcen,en, schedule ^ I,”" 
0.) for some k. 0, ^ 0. and for th,s k there ,s an a and «, such J 

all sequences Oi, Ci, 


Pr[E, 


= k I e„fl„ ^ ^ ® 


Then hm = < ««« and ts tndependent of the mtt.al d.str.huuon of 
rCrparUcular, that when the 

has a limiting value as « tends to in m y probability of any finite 

the initial distribution More ^^, 0 , as n tLds to infinity 

sequence of outcomes occurring ha . . assumes that 0* ^ 

The model most often used m app i ^ reinforcement 

i ^ 0, e. = 0, A„ = 1, A„ = 0 ^ of Theorem 14 to be 

schedule with lag n for this case, r«nonse k has a positive proba 

satisfied we need only 1 po^matter what the outcome of 

bihty of being reinforced on <h= «th ‘na "tatu^ 

was, similarly, for the ® „son learning situations 

These results were extended on the responses of 

in which the reinforcemen s ^ J„on which Markov theory is 
both subjects This again '^^‘ds to a srtu ^ ,heory of 

not applicable, and it is necessary ' p,,^ 

chainVof infinite order to prove asymptotic 
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whichfs'm”iIbatateachtnaUh»^^^^^^^^ 

ai=|iifsassi:n^f: 

system theorem ana in 
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are suggested by the following game interpretation If a game is fair, that 
IS, has the martingale property, then it should remain fair if the gambler 
IS allowed to use certain systems of play Of course, there must be some 
restrictions on the systems allowed He should not be allowed, for 
example, to look into the future We illustrate this m terms of one kind of 
system called optional stopping This means that the player is allowed 
to watch how his fortune proceeds and to stop playing at some time 
which seems advantageous to him We describe this mathematically by 
introducing a random variable t such that the decision to stop at time n is 
represented by /(co) = n Here o> is a sample sequence for the process 
. The condition that the future is not available to the player 
means that we must be able to tell if /(<«) = m by knowing only the first 
n components of to s= (a^, a^, , a„, ) Sometimes it is con- 


venient to allow /(to) = CO, which means that the player does not stop, 
but we shall assume that the probability is zero that this happens Given 
a stopping time t, we can associate a final random variable Xt which m 
the gambling interpretation represents the player’s final fortune If to = 
(fli ffsi » ) and r(a)) = n then A*, (to) = a„ Now if the game is 
to remain fair under an optional stopping system of play, it must be the 
case that E[Xi\ = £[A'i] That is, in terms of expected value the system 
provides no advantage The first theorem that we mention states that 
under a suitable hypothesis this is true 

Theorem 15 Let A',, A'j, be a martingale observed until the random 
stopping time t Assume that during the period of observation all the 
outcomes are in absolute lalue less than some constant K Then if X. is 
the final obseriation, £[A',] = flX,! 

A proof of Theorem 15 can be found in Doob (1953, Chapter 7) 

A sufficient condition for the theorem to hold for any stopping time is 
for the process X„ A'j, to be uniformly bounded That some such 
condition IS necessary can be seen from the penny-matching game, which 
amof ? “f " a"'’ “ "’'’""’6“'= If each player hL an infinite 

nhver cf 'f pl“y ean continue as long as he pleases, then one 

Ltd he ,S ^ >'■= following storing rule Play 

LLiLe L.h Then h.s initial winning ,7o and his final 

winning with probability 1 will be I That is, CTA-.l = 0 and £1^ 1 = 1 

S,1 V that ih "" “’’'’a" '' “I f“<^<. ‘o find the 

Lamde That Pci <>"= P'“y=^ Aponte, for 

com-tossmg process st„ppTSLTr:Lbm::tTeTc;':s'+;^oTT"A7t‘ht 
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time that play continues Ihe lJJj"^bamuty that Paul tons 

* -- p»““> ' “ 

expected final fortune is (1 — f)B 

r. r\ cn ttiat bv the theorem 

However, his expected initial winning wa 
0 = pA — 

or p = BKA + S) This is typical of how a gambling type mterp 

from the following theorem Hilh nomegatiie random 

Theorem 16 to, Xr. X. ^ “ T"f„es <o a l.m.m lalne 

lariables Then vtHh probability r addition the X„ are 

XM Furthermore f ,L £(A',] 

uniformly bounded in n, then 

This theorem is proved in Do™ f ' ' ^ theorem, recall that a 

As an illustration of the use of Markov chain with states 

branching process is a then / independent expcri 

0, 1 , 2, , such that if the chain is V’ j next state is the 

ments with integer valued outcomes arejierf^ , that is with one particle 

sum of the outcomes Assume we st ^ 

and let X, be the resulting proass ^ Then 

of particles produced by a *'"8'' ^ Jby the theorem it converges 

Z., Z, IS a nonnegative ^urtingale and y '"creas s to 

to a limiting random variable ^ p S,„„ wc know 

infinity unless at some time ,q„al 0 That is the proc 

IS finite it must be that Xn 

dies out with probability I mean number of o ^ 

consider next the c.e -vkic-;”; "ingale and - ■; “merges 

L”:m;rpathsnri^^^^^^ 
eventually dies o^'" ' P“ „r,lw limit is less th 


positi\c proDauiuv;' r Here is an examp^^ ^ ^ 

- - ‘“-'"y f « “rje'Kof llw limit is less than the expeCe 
for vv hich the cx^ continues indefi- 

the initial outcome probabililv the proc« The probability 

>e '• ‘':Tne"o infinuy more slowlv “rv as 

nitcly If ^U can also be found by equation 

or the rro«^y "-'here is a positive solution <r < 
follows irm> + . 
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where {po,Pi,Pi, ■ } *e distribution of the number of offspring 
produced by a single particle Then the process Y„ = is a martin- 
gale The outcomes of this process are bounded, so the expected value ot 
the final outcome is the same as Xi, which m turn is a However, the 
limit of this martingale is 0 on a path where X„ does not die out and 
on a path where it does Hence the expected value of the limit of Y„ is the 
probability that the process dies out In other words, a is the probability 
that the process dies out For a complete discussion of martingales see 
Doob (1953, Chapter 7) 


6 STATIONARY PROCESSES 

Recall that a discrete stochastic process {X„, « = 1, 2, 3, } is 

stationary if 

= ^1) ^ “ ^2) ^ ^ 

IS the same for every h, which in words means that in predicting a particular 
sequence of outcomes the time at which we start our predictions is not 
relevant 

"We have already seen two examples of stationary processes The first 
was a sequence of independent random variables with a common distribu- 
tion The second example arose as follows Let P be the transition 
matrix of a recurrent Markov chain, and let a be the stationary measure, 
that IS, aP = a Then if a is a probability vector, we can use a as a starting 
distribution and the resulting process is a stationary process 
For each of these examples we have seen that the averages 
-I- 

n 

converge with probability I as n lends to infinity When {Af„} is an 
independent process, this average converges almost everywhere to a 
constant value, namely, to flA'i] For Markov chains, the limit is the 
same for all paths that start m the same recurrent class but differs among 
classes Thus, for both the independent process and the ergodic Markov 
chain, we have a limiting random variable, but it may not be a constant 
with probability 1 These results are all consequences of a very general 
theorem known as the ergodic theorem For stationary processes 
Theorem 17 Let (A’„} be a stationary process and let f be a function such 
that £l/(.Srj)] IS finite Then 

n 


comerges Mith probabiUt} 1. 
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A proof of Theorem 17 may be found m Doob (1953, 

In addition to knowing that the averages 
know under what conditions the averages conve g 
individual trials For this we need a new cone pt Let T bejhe ^ 
formation of Pomts^“ -h ^at^ d « simjdy 

mptaces the history a, by the history m' obtained from ai by dropping the 

first outcome and shifting the others to e e „,„ts ai' of the form 

If A IS a subset of Q. the set TA is the of al Pomts » o 
r<o for a, in ^ A set ^ is said to be an ..wanant under 

at most a set of ai’s of P''ob^‘>*‘^“ is true for the empty set 

the shift operation since TO - O, an metrically transitive 

Definition 21 A stauonary process {T„( „ | ,„vanant sets 

if the shift operation has only f . although this requires a 
An independent process is metrica y r ^ chain started m 

proof The second example a'’°''0' Indeed, let us assume that 

equilibrium, need not be metrically r ^ sequences 

there are two recurrent classes A an measure is 2 o,, 

that begin with a state m A is an invariant set Its ^ 

which IS not 0 because the oomponents a’ single class, 

because the sum of all the a, ad ® “P , (he two examples that we 

the resulting process is ^ ” „ve, namely, independent processes 

have given which are metrically ‘^nsit • ,^cge numbers 

and recurrent Markov chains wl ® tanl These are special cases of 

states that the averages converge to a con^tan. 

the following theorem for P c«s that is metrically transUne 

Theorem 18 Let then 

Iff IS a function such that P 

rrr.l + f(XJ + _ +MA £[/(X.)] 

n 

six*! Chapter lOIforaproofofthis — 


6 1 Examples and Applications 

A third example of a P™"” poT example! ‘c!d 


V r IS an independent process ' q 3 , „ch stage 

--‘‘V”fhf.LrS!So“ Tar.s, we start at time three and 
average of tne >ai.i 
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record 

V _lL±i±I? 

^1- 3 

y n+ n+ n 

Jr.- J 

V _ + Y„^2 

3 

Then the process X^, is a stationary process More generally, 
if Tj, Y 2 , IS an independent process with common distribution, and if 

X„ = Co^n + ^r^w+r 

for any fixed c^, c^, , c„ then X^, is a stationary process which 

IS called a process of moving averages 

Another way in which stationary processes have arisen is the following 
Suppose Xi, X 2 , IS a process that is 

1 stationary, and 

2 X„ = + CiX„.2 + + CrX„ , + y„, 

where IKj, is a sequence of identically distributed random variables 
The first question is whether such a process can exist at all, and, if so, 
whether U is in some sense unique This is quite similar to the procedure 
used when observing a determimstic phenomenon A‘(l), A'(2), when 
we postulate that the A'(n) satisfy a difference equation of the following 
form 

X(n) = CoX(n - 1) + CtX(n - 2) + + c,X(n - r) + Kfn) 


where y(n) is a Known function of n 
Procedures for solving this stochastic difference equation have been 
developed and conditions under which a stationary solution exists are 
Known A process that is stationary and satisfies a difference equation of 
the preceding type is known as one of auforegresston For a discussion 
of these processes see Rosenblatt (1962) 

A problem that has received a great deal of attention is that of predicting 
future outcomes m a stationary process when the outcomes up to a certain 
time are known To describe this problem more precisely, it is convenient 
to assume thit we are observing a process that has been going on for an 
infinite length of time and that will continue indefinitely More precisely, 
the process is described by random vanables of the form 

. X_„ X„ A',, 
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Stationarity is generalized to mean that 

Prl(X„ = Oi) A (X„+, = flz) A 

IS independent ^ “utl^eTup'lo time n. how do we 

The problem, then is o 

fandTm^lnabri™! - '€i -« - =*8'“ 

£t(l-S^.+r - 

asameasureoftheerrormade 

arbitrary, but it has been c ose (he estimate jr„+, that 

problem is then to find a me op-nuence of the errors, that is, that 

Minimizes the expected va ue o^he .qujnce oH 

minimizes £[(l,lf„+, — -^"D I ^ u ^ combination of previous 

that the estimate is assumed to be a imea 

outcomes, _ v . ^ y . 4- 

"oetlds of this solution are given m Rosenblatt 

for some c„, c„_i, 

^Ta«nalappheation.wemen.io^^^ 

process, which is an important concep outcomes 

Iheory as introduced by Shannon If an ejp ,,, „„„py 

a, n/ , n, with probabilities p(n,). pfn^I. 
isdefinedtobe -2.(«i) ^ 

If Zis the outcome function for this experiment. 

H = E[-logp(-I0] 

.ewou,d,however,hhetr^inh^^g^ 


n OI course, »•* & nrbitrarv statjonai^f 

‘X%cedmg"defin.t.on ,3„ assign an entropy 

urce However, using 


of text o) 
use 

source a - 

to the partial source 

2 Ku.n. n.) >08 


Or), 


_pp 1 (v =oOA(jr» = ‘'=)A 
where i.t_ ^ mitcomes 


A (X = ^r)l 

^ (^2 = ^ y n aho wntc this 

rh'';s"m'’rSenover;Uposs.hle«rstrouteomes We ca 

ns -El\oslKXi,X.. 
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^ y. + n + n 

X . n + n + n 

" 3 

X = ^«+l + ^«+2 

” 3 

Then the process X^, is a stationary process More generally, 
if Ti, Tj, IS an independent process with common distribution, and if 

for any fixed Cq, , c^, then A\, X^, is a stationary process which 

IS called a process of moving averages 
Another way m which stationary processes have arisen is the following 
Suppose X^, X 2 , IS a process that is 

1 stationary, and 

2 1 + CiX „_2 + + CfX„_f + Y^, 

where y„ Y^, is a sequence of identically distributed random variables 
The first question is whether such a process can exist at all, and, if so, 
whether it is in some sense unique This is quite similar to the procedure 
used when observing a deterministic phenomenon A'(l), X<2), when 
we postulate that the X(n) satisfy a difference equation of the following 


+ c,X(n -r)+ Y(ri) 


Xip) = c„i-(ii - 1) + c,Jf(n - 2) + 
where y(ii) is a known function of n 
deveWd'"/"'' 'a'""® ^t^hastic difference equation have been 
to™ A nr^ “ m solution exists are 

te nrecetnT, v =* difference equation of 

of th«e oroce« au,oregress,o. For a Iscussion 
01 inese processes see Rosenblatt (1962) 

futte ^Icomtln aTr"'"’ ‘hat of predicting 

tte are known when the outcomes up to a certain 

to asrmX we nmT P™*”™ "’°‘= P‘==-‘='y. “ convenient 

infinite length of time and^tr't"* has been going on for an 

• » X_2, ALj, A'o, Xi, A'j, 
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STATIONARY PROCESSES 

Stationarity is generalized to mean that 

= Oi) A = «2) ^ 

IS independent of « for both positive and 

The problem, then is GivenJJhe 0“*“ have a way of 

predict the outcome wg therefore introduce a measure of 

telling what is the best predictio j^e 

“r ' “J'lr" » “"! £ 

random variable |A„+r ^n+r\> ^ ® 

- JfJW 

Of course this measure of error is rafter 

as a measure of the error made mathematical convenience The 

arbitrary, but it has been ,j.jj^minmg the estimate X„+, that 

problem is then to find a me o the errors that is that 

minimizes the expected value ^hlem has been solved provided 

minimizes £[(l^„+r - ^ ^ I near combination of previous 

that the estimate is assumed to be a line 

outcomes, ^cX„ + c, + 

for some c„,c„., '^'DeJls of this solution are given in Rosenblatt 

^'rahnalapphcation, we mention— 

process, which is an experiment results in outcomes 

Loryasmtrod^^^^^^^ ,p(n,, then the entropy 

IS defined to be 

:rXistheoutcomefunctionforth,sexperiment,ftenwecanwri 

H = E[-\o5piX)] 

i,.’ '..ii..*-. 

,, -ulA = =‘"‘‘ 

C a) = = v\c can also wnle this 

ftc"L"^rSen oxer al, possible firs, r outcomes 

“5 -Ellogpft.. '< • 
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H, = --EllogpCX^.X^, ,X,)] 

n 

Then if H, it is reasonable to call H the entropy per sig al 
Actually more than this has been proved The sequence 

y„ = — - log p(Xi, Xz, , X„) 
n 

has been shown to converge with probability 1 to a constant H and 
H„ - E[ yj -> H provided only that the source is a stationary, metrically 
transitive process The proof of this assertion involves using both the 
ergodic theorem and the basic martingale theorem An interesting discus- 
sion of the idea of this proof may be found in Chung (1962) 


7 CONCLUDING REMARKS 

We have considered four basic types of stochastic processes, namely, 
independent processes, Markov processes, martingales, and stationary 
processes This is a rather small number of types, considering the wide 
variety of applications that have been found for stochastic processes 
Although It IS easy to define new types of processes, it is not always easy to 
work with them mathematically A good example of the fact that one 
cannot neglect the mathematical difficulties is to be found in the observa- 
tion that every process is, in fact, a Markov process if one just looks at it 
correctly To see this, assume that ATj, X^, is a discrete process taking 
on values in a set A We form a Markov chain in which a state is a pair 
(n, Oi, Oz, , a„), where n is any integer and Oi, are n possible 

values from A If the first n outcomes of the process Xi, X^, , X„ are 

,a„ respectively, then we say that our chain is in the state 
(n, Uj, O 2 , , a„) It moves to a state of the form 

(n+l, o„o„ . 

With probability given by 

p 

<" «i a,l (n+l a, a, a, = 

^'[Xn+i = 1 = a„) A A A (yi = aj)] 

Since the study of this Markov chain is equivalent to the study of the 
original process, we can reduce the study of the most general discrete 
process to the study of denumerable Markov chains The difficulty with 
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this approach is that the state space is qu.te complicated and unnatural 

from the point of view of be made between the 

It IS clear that some reasonable ^ process being observed 

assumptions that one wants to ma e studied A good 

and assumptions that lead to a . jbeory of queues, which is 

example of just such a “7, ^l^manT 

devoted to processes that arise in t e o _ _i oaMiii'p lime also 


example oi’just such a tio manner Persons arrive at 

devoted to processes that the service time also 

random at a service station where t ^^^be 

being a random quantity, and they random service 

specffic the nature of the random f ,„d, for many 

times The study started wit qm ^bat even with simplifying 

situations, unrealistic ones It .nfnrmation was gained about 

assumptions, quite useful ‘ “ '^^ubiect has by now been studied 
practical queueing problems ,„^,des a ereat variety of assump- 

mtensively, the voluminous htera u j 3 , 10 ns For an introductory 

tions to fit a wide range ^ “ ,96i) 

treatment of this subject, see “ .astic process theory object to the 
Many users of present-day f P ^t^died m greatest detail, 

emphasis on stationarity ’P'’® „/each have a stationarity assumption 

independent trials and ^arkovcha n , » ^ Markov chain, the 

■With independent trials, each trial to A ^^^b time It is 

transition probabilities ate assum assumptions be weakened 

natural m applications to , ,,onary transition probabilities are 

Results on Markov chains with ,hat little of a specific 

known, but this class of process 8 extremely eomplicated to 

nature can be said Even the are possible depending on 

analyze, and widely different «yP“ " S„„e ergodic theorems and 
the Lture of the transition have been discovered See for 

some limit theorems for these P ary to find some specific m 

example, Dobrusin (1956) It seems nMesra^^^ ^ study becomes 

We have Riven only tne Ddi ,„iq53 was able to surve^r 

his m order to provide y^^_ ^^^^,be "O' '"^',P ,„d,„ on the 

of the subject, it is sm P example, r^m ^ 

which little o; "^"/auggested the following simple pro 
growth of cancer have sugi, 
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large number of unit cubes, choose one and attach another randomly to 
one of Its faces Add a third randomly to one of the available faces of the 
pair Continue m this way, each time adding a block at random to what- 
ever faces happen to be available Practically nothing js known about this 
very simple process For example, it is not known whether the growth 
approximates a sphere after a large number of additions or whether, once 
a distortion occurs by chance, distortions tend to become exaggerated. 

It seems to the author that, just as psychology suggested the study of 
interesting processes m learning theory, other areas of psychology should 
become the source of new and interesting processes for the probabilists 
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Functional Equations 


In studyng n,a.hen.at.cs and ..s -n-fold p- 

first equations that one meets are ^ ^ ^ , i^jaf equations of the 

gressively, algorithms for the solution of ‘ ^ 

fol J+ b = C. for the solution of linear systems 




j = 1.2, 


for quadratic equations such as 

az^ + 26a: + c — 0» 

», ... -.»! »-» ■>; 

known function, or functions if w for the first time, the question 

In elementary calculus, we ,he basic problem of integral 

determining an unknown ° whose derivative is the give 

calculus IS that of obtaining ‘h' ifferential equation, 

function gW This IS the simplest type old 

complicated classes 

In courses in mechanics, and P''y'fJ"St;.rvestigation of HooU^ 
of differential equations arise F J dchections from equ 

IV for the behavior of a spring under small 


law 1 

leads to the equation 




+ m/= 0. 


.he motion ofthe pendulum introduces the equation 


^ + kstnf- 
dz^ 


, These arc at. examples of equations m^otunp unknown 

and so on These 
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scientific domain Nonetheless, other types of equations expressing 
properties of unknown functions play extremely important roles To 
illustrate this, let us discuss one way in which an equation of a quite different 
nature arises m psychological research 

Suppose that the reaction to a stimulus is being studied by means of a 
combination of experimental and theoretical techniques If the stimulus 
is of physical nature, say a light source, we can attach various parameters 
to it, intensity, duration, etc For simplicity, suppose that there is only 
one parameter x and one reaction, measured by/(a:) 

Sometimes we can ascertain all the desired values of /(x) experimentally , 
sometimes a prion theoretical arguments suffice Most frequently an 
amalgam of previous theoretical results, experiments, plausibility argu- 
ments, and bold extrapolation yield certain relations for/(x) which, we 
trust, determine /(x) completely The values obtained from this analysis 
are then compared with previous results of like nature and experimental 
observation, leading to further experiments, revisions of the assumptions, 
etc , see Bellman and Brock (I960) 

For example, m the situation described, we may surmise that the response 
to an increase in the stimulus parameter is additive, namely, 

f{x + y) =/(i) +/{j/) 

This IS an equation for the unknown function f(x), a simple, but extremely 
important, example of a functional equation 
The function f{x) = kxy for any constant k, satisfies this equation Is it, 
however, the only solution’ As we shall see in the following sections, 
sometimes simple equations of this nature uniquely characterise a function, 
and sometimes they do not 

The term “functional equation” is such a broad one that we shall make 
no effort to define it Operationally, we can conceive of all equations that 
involve unknown functions and that escape the conventional areas of 
differential, difference, and integral equations as constituting the domain 
of functional equations From this it is obvious that there is no routine 
way to provide a survey of what is known about functional equations and 
their applications nor any time-worn introduction to their study Were 
we even to restrict ourselves to the new and rapidly expanding area of 
mathematical psychology, the task of classification and analysis would 
still be overwhelming 

What we shall attempt to do m the following pages is to provide the 
reader with a background of classical analysis that motivates the investiga- 
tion o many important classes of functional equations and that explains 
some of their origins -From time to time we shall indicate a few of the 
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connect, ons with contemporary research ,n psychology and provide some 

references to current literature ,,o<n Thi« work nresents 

The most important single reference is Aczel (196 ) , 

a lucid and detailed exposition and contains an almost complete bibli 

ography 


1 DIFFERENCE EQUATIONS AND ITERATION 
1 1 Difference Equations 
A standard mathematical approach to the study of 

where the adjective physical IS here use i i each a measure 

introduction of a number ^ "’be position and velocity, 

of a different property of the eenerataed’^ coordinates, proba 

voltage and current, P ,bese properties change over time 

bihty distributions, etc Usually, mese jj r 

Hence, we write iiCO, system, we must make 

In order to ascertain the behavior onhejys.e 

various assumptions concerning perhaps most important, 

functions One of the sy"em (or a process-we 

mathematical models of th^e .^obtained by assuming that the 

shall use the terms ^J^„„sent with complete independence 

immediate future depends only on P of ,h,s hypothesis, let us 

of the past To derive an analytic conseq 

proceed as follows discrete fashion, so that I assumes on y 

Suppose that time varies m a ^ t the values x,(r + I). 

the values 0, 1, 2, “stem m the immediate u.ure) 

dVpelds only on the present values x.(<), x.t 

the set of relations , = 1, 2, . , A, (0 

x,(t + 1) =gi[^i(')’''*('^’ ’ pons which we arc free to 

r . , _ n I 2 , where the g, arc functions lead to 

prescribe ’ Various “”“"'P“p"b jysrem Companng these predictions 
different predicted behav.ore can test the validity 

With observations and otner 

r iheir arRuments, ^ , v (2) 


of their arguments, ^ 


i= 1,2, 
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It IS also important m a number of investigations to consider processes 
in which the dependence changes m the course of time This is most simply 
done by letting the coefficients depend on time, that is, <7,^ = 

The introduction of elementary vector notation greatly simplifies the 
typography Letting 



^i(l) 






gj?:) 

t:(,) = 


. g(x:) = 





_gs(?:)_ 


we can write Eq 1 m the compact form 

^(' + I) = (4) 

If A denotes the W x matrix (a„), i,j =1,2, , N, the linear 

relations of Eq 2 may be written 


« + 1) = Ax(!) 

Equations of the preceding type, Eqs 4 and 5, are called difference or 
r introductory account may be found 

in Goldberg (1958), for a discussion of the connections between matrix 
heory and equations, of the type appearing in Eq 5, see Bellman (1960) 


1 2 Iteration 

Front the previous discussion, in which the analytic aspects are merely 
ra e l=rrare ‘=™inology of rather iLediate ideas! 


"('+ l)=gli<(01, 
and write, for ii = 0, 1, 2, 


“(0) = c. 


I = 0 , 1 , 2 , 


( 6 ) 


"(n) =/n(c). 


veil as the de^nd?nc7on theTir!ie'n'''’*"'*'"“ 

fu!!nonXqumron satisfies the basic 
Af.(c) =/„t/;(c)i. 


( 7 ) 
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F n.9 and all c This may be regarded as an expression 

form,« = 0, 1,2, . ,anaaiii. , „ .xnression of the law of 

—T^aT^hf ■‘— 

see the Lcuss.on in Bellman (1961) Schematically, 


■UU‘)\ 





Verbally, this states that a system in allowed to proceed 

m + „ Lds up in the -me ^ate - Jh -"Vn^ew state/„(c), allowed to 
for a time w, stopped, and then, 5 

T? », . -!« or ... 

a:™“ Tr.;‘o«-"™» ■■ 

with differential equations , , j ,„ 5 ,e values of m and «, m, n 

Observe that Eq 7 is valid on y dis«e‘e 
1,2, .It turns out to be quite r„ all c and all t S 0, 

find a function of two variables, gt . h 
with the property that =/,(c), 

for 71 = 0,1,2, , and that ( 9 ) 

g{c,7 + a) = Sfe(‘'’'^’*’ 

for all nonnegative t and J attempting to ‘"’’’'^j'.fg'^'of 

Thisisani7.rerpoto<m"Pt‘>W^J”„3ry.„g ,he same 


;onSle to suspect «-“‘;°^a"a';?g;e'atdilT,cul.y.« = 

Parenthetical IJ Jet observed that 

of this general l)pc>Nnc 

f« 


r(T+ »=J, 


■'di 


. d-M > 0. and h) analjne 
, function), defined m this «a> for R't ' comr ex 

(the gamma 7;„„niph,c function of a- oxer 

continuation as a met 
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plane, satisfies the functional equation 

r(x + 1) = xr(z) 

Hence r(x + 1) = a: i if a: is a positive integer Thus, the function defined 
previously by the integral represents a continuous interpolation of the 
factorials 

Iteration and the theories that flow from this fundamental concept occupy 
a central role in classical and modern analysis (Hadamard, 1944, Montel, 
1957) In a natural way, iteration generalized to produce the theory of 
branching processes (Harris, 1963) and the theory of semigroups (Hille & 
Phillips, 1948) In connection with multistage decision processes and 
modern control theory, it leads to dynamic programming (Bellman, 1957, 
1961, Bellman & Dreyfus, 1962), m mathematical physics, it leads to 
Preslrad l‘%3) '^56, Bellman, Kalaba, & 


1 3 The Abel-Schroder Equation 

potedTn '"‘^polation problem 

posed m the foregoing section, is the following Given the functmn 
can we find a function g(c) and a constant k such that ’ 

= kg(c)-> ( 10 ) 

lananl oVfhe tramformatronX)" Eo ’ 7' X ^ relalwe in- 
in analysis, the equation of Atel-SefrX™ " ^ "‘1“““'’" 

prXetrfir's^rr^^^^^^ - 

10 Then PP “ P""'"”" SW that satisfies Eq 

S(/[/(c)]} = kgy^c)] = 
and an easy induction shows that 

S[/‘"’(c)] = A"g(c), 
for n a positive integer Hence 

Introduce the function = S“’lA"s{c)] 

G{c,l)=g-i[iti (-1 

or I ^ 0 It follows immediately that 

Th ‘^P''’'+') = C(C(c.r),a] 

" “ """ ■"'"P'’'“'- P^blem posed in Eq 9, Sec 1 2 
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DIFFERENCE EQUATIONS AND ITERATION 
A psychological problem 

If the physically measured jnd at c is A(cl, tn 4 

e[c + A(c)] - C(c) = ^ 

where AT is a positive constant If Eq 10^ This 

k = e^, then it is clear that this con i pj^-rds (1958) and in Luce 
problem is discussed in detail m Luce and Edwards (190 ) 

and Galanter (1963, pp 206-213) 

1 4 Hcunstic Dcnvation of the Linearizing Function 

Th. » .. ^ 

asymptotic or steady-state behavio analytic function of c m a 

Consider the special case where /(c) is an anaiy 

circle Id ^cp = V + • 

with ix ys 0 Introduce the function 


f n(c) = 


f’(c) 


and suppose that the sequence {F,(c)} 


ahmit asw- 


>. 00 Let 


Then 


/‘-’(c) 

/"”t/(c)] 

y[/(c)l=_lun-j;;r- 
/*-**’(c) 


(H) 


= lim 


1 ^.llm 


6." 

/'•“’(c) 

‘ I,-*! 

6i 


= 6.s(c) 

Thus « obtain .he function required mCq 10 
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In the case where 0 < |M < I. it is not difficult to establish the existence 
of an analytic function j(c) = c + + m a sufficiently small 

neighborhood of the origin The result was first given by Schroder (1881) 
and Koenigs (1885) For a discussion of the two-dimensional case and 
references to other and previous work on the multidimensional case, see 
Bellman (1952b) For detailed accounts, see the books by Hams and 
Monte! referred to previously The area is one which requires very careful 
analysis and often some deep and delicate considerations 


1 5 Stochastic Iteration 

In the study of some situations, we encounter processes for which there 
IS a possibility of one of several transformations being applied at any stage 
Considering the simple case in which there are two possible transformations, 
/i(c) and flic), with respective probabilities and pz, we are led to the 
functional equation 

figLACc)! + Pig[A(c)l = bf(c) 

See Bellman (1964) for a further discussion 


2 DIFFERENTIAL EQUATIONS AND 
FUNCTIONAL EQUATIONS 

2 I Introduction 


We are led to systems of differential equations of the form 

dXj 

-J = Sii^i^ , xO, xfQ) = c„ I = 1 , 2, , N, (12) 


either as limiting forms of the class of difference equations already con- 
sidered or as direct consequences of postulating that the rates of change of 
each state variable depend only on the current values of the state vector 
xit) 

In the foUoVimg sections, wc wish to indicate connections between the 
solutions of equations of this nature, the preceding results, and functional 
equations m general 
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2 2 Causality and Functional Equations 
Uniqueness of solution, which is W say causality, enables us to den 


fundamental functional equation Write 

-*iW' 


i(l) = 


Tn«) 


and 


i(/) =/(c. '). 

indicating explicitly the *ss^ vve obtain the fundamental 

Then, as before, assuming uniqueness, 

relation ' cm,. A cl. 


c 1 t>3) 

relation j) =/[/(e, r), r). 

1 of. for which a solution of Eq 12 can be 

for J, / ^ 0, and the values of 

continued over time readily established The va ue o 

The connection "'‘th Ue^J ^ ^ function of c, 

the solution of Eq 12 at / 

«(i) = 

It follows that „(2) = 

and, generally, h(h) = 

If pf«) = OiU + OtU’ + 



an esery such one-dimensional anal)tie iransformalion be 

df£ = x(«) "(O) = 

1 cc lA I 

Forthccase^vhcrclA,|< •• 


( 14 ) 
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and write 
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/(c,o = *-nfci‘'>wi 

Using the fact that/ satisfies Eq 13, we have 

§^=|^[/(c,(),s]l =s[/(c,()]. 
dt us 


05) 


where g{u) is defined by 


gi}t) = 


3/(«. 0 

01 <-0 


This yields ^(u) explicitly in terms of Eq 14, if we use the representa- 
tion of Eq 15 The multidimensional analog is readily obtained 


1 3 Functional Equation for Exponential 

The case where ^(«) « au leads to an interesting result Write the 
solution of the linear differential equation 

— = aw, u(0) *= c 
dt 

in the form u ^ E(j)c The basic functional equation of Eq 13 yields the 
relation 

£(r + s)c = 

for all c, and therefore 

£(/ +s) = E{s)E{i), (16) 

which js the fundamental property of the exponential function E(t) = 
Similarly, defining the matrix exponential as the solution of the 
matrix differential equation 

^ = AX, XfO) = ! 
dt 

(where I is the identity matrix), we obtain the functional equation 

for all t and s However, unlike the scalar case, if and 

only if AB = BA^ see Bellman (I960) for further discussion of the matrix 
exponential For general operators A, a study of the exponential leads 
to the domain of semigroups (HiUe & Phillips, 1948) 
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2 4 Cauchy’s Problem 

C.«h, ..Mh. 

the exponential function We are thus led to study 

»(( + i) = «W“W 

asanequationforanunhnownfunctionuW 

will shov,, the problem is more sophisticated than we g 


2 5 The Linear Function 

T- 1 < are led to the equation 

Taking logarithms in Eq 1 . 

F(l + r) = W + 

rnr FM = log £(0 Hence it is 
assumed to hold for — co < ^ solution of Eq H is given 

sufficient at the moment to examine when 

by F(I) = kt for some define the class of functions under 

To make the problem precise we must^djfi^^ h,ch 

consideration , \hen, differentiating with respec 

possess derivatives for all nnne equations 

Ld then with respect to a. we obtain the t 
F'O + ^) = 
f '(( + a) = ^ 

A standard Thus, and this is 

differentiation to procedures, 

also a standard step m all such proc 

r(r) = f'W = *- ^ 

H , Hrt = k, + b Since f(0) = 0, from Eq 17. 

a constant It follows t a nv difficult questions 

remains the linear solutions ofEq 17 can oc 

scalar case, m the v 
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for a very detailed and lucid discussion of these matters and for many 
further references, for a brief discussion of the matrix case, see Bellman 

(I960) , ^ 

The solutions of Eq 17 which are nonlinear are quite weird and cannot 
be characterized simply Their existence, as noted previously, depends 
upon an acceptance of the axiom of choice 


2 6 Associated Functional Equations 

Starting with Eq 17, we are led to the equations 
F,(( + = n(0Fi(s). (F, = 

F,its) = F,(i) + F,(s). (Fs - log 0, 

Fm - F^(0F,(s), (Fa - F), 

all equivalent, under various conditions of regularity, to Eq 17 upon 
changes of dependent and independent variable These equations arise 
frequently in mathematical analysis and applications Once again, the 
reader is referred to Aczel (1961) 

The matrix equation 

F{AB)^F{A)F{B) 

IS quite interesting since under various assumptions it characterizes the 
determinant of A, see Aczel (1961) 


2 7 General Functional Equations 

It IS clear that we have discovered a new game Start with any ele- 
mentary functions, say sm t and cos t, and write down their addition 
theorems 

S(r + 5) = S(i)C(j) -b C( 0 S(^). 

C(/ + 5) = C{/)C(J) -f i-COSfi) 

Under what conditions do these relations uniquely determine the original 
functions'^ 

Continuing in this vein, we may ask for the determination of a function 
f[t) 5ausr)ing a given polynomial relation 

The gcncnl problem is quite difficult since we enter the domain of elliptic 
and Abelian functions, follow mg the lead of Wcierstiass, sceRausenberger 
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(1884) and We.erstrass and Schwarz (1893) A number of particular 
equations of this type are discussed in Aczel (1961) 


2 8 Approximate Functional Equations 


^ VU z 

LI .n analysis and mathematical physics 
A fundamental class of problem •• i„ verbal terms, the 

IS that associated with the concep ^ equation is altered 

question may be stated in the f° 
slightly, are the solutions altered sligh V 

For example. ,f we have a polynomial equation 

r" 4- » 

L , ™ail chanees in the coefficients produce 
It IS not difficult to show that small cha g 
correspondingly small changes m the roots In 
continuous functions of the coefficien s 
If we have a differential equation, such as 

i\ .. 


£n 

dl 


i + u(l)ii = 0, 


u is a much more " "f.: 

the properties of a(0 , 

properties of », see Bellman (195^ inequalities that 

In general, in scientific 'nvest.gatmn^^.t . „on 

is far more important than the y 9^ j| ^ ,, where e 

F(u) = 0 with a solution ii. wc want oHnn quite difficull 

a small quantity, implies . ^ T- 

connection with ,„„„,„ly 

the previous section, we can as „ ^ , 

1 rti + id -/W -/Wl ^ , 

+ J> j , , A for some consnnt * 

^F: -erui;: o/t^s rre::e"ut:n^-<>' H;ers ,19«, in this case ...c 

answer is afTirmali'C 


0 9 Erdos- Functional Equation 

Allerna.isely.weCTnrre.oXXe''<^^ 

Id restricl ihc ,s defined onis for r - I. - 


nnd “-’"'j* ’ ttc assume 
suppose that 

that 
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f-uhctional equations 


if m and n are relatively prime It is not true, m general, that/(n) — n*, as 
in Sec 2 6 

Number-theoretic functions satisfying the preceding equation are 
called multiplicative For example, d(n), the number of divisors of an 
integer n, is a multiplicative function If we add the condition that 
/(o + 1) >/(n), then it is true that/(M) == «*= for some constant k, see 
Erdos (1946) 

The problem of determining when a polynomial equation of the form 

PUiin),Unl ,fM\ - 0 

can hold, with each/,(n) a multiplicative number-theoretic function, has 
also been studied (Bellman & Shapiro, 1948) 


2 10 A Limit Theorem 

If M(a: + y) u(*) + uiy) for a:, y ^ 0, and u{x) is continuous m », we 
know that «(*) « kx for some constant k If this equation holds only for 
37, y ss 0, 1, 2, , we can draw the same conclusion 

In many investigations, we can only deduce that 

u(m 4* n) ^ «(im) -F «(n) 

for w, n s* 0, 1, 2, Nevertheless, we can assert that u{n) acts like 
kn for large n in the sense that 

n-® n 

exists, see Polya and Szego (1945) 


3 PROBABILITY THEORY 
3 I Preliminary Remarks 

The theory of probability is a breeding ground for functional equations 
In very natural ways, investigations of utilities, means, laws of probability 
theory, etc , pursued in axiomatic fashion, lead to the question of determin- 
ing all acceptable solutions of various classes of functional equations 
'558, 1959a, b, Marcus, 1962, Richter, 1952a, b, c, 
1954, Shannon & Weaver, 1949) Many additional references may be 
found m Acz61 (1961) 
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Many areas m modern probabili y tranLrmations, see 

considered to pertain to the iterat particular, we most frequently 

Bharucha-Reid (1960) and Harris ( J ^ pf ^ j^rge number of random 

encounter the problem of describing i._,t,nE distribution of 

effects, which is to say, of determining the limiting 


= (Tx + Xi + 


+ x„ 


Where the are independent «"<>-;7prpr-ssfrrre';m:ted m 
functions More general 

Harris (1963), Dvoretzky (1956), and Be fashion Let m 

Interesting functional equations ’ '^ppp f„d y a random 
be a random variable with the **“* ^_. ^ + y has the density 

variable with the density function My) Then 
function fc( 2 ), where 

K(z) = 1 S(* " 


= J" h(z - y)g(.y) 


Suppose that g and /, are both 

Z =’}“m:n“^ho a member of this family, say 

f[z, r{a, 6)]t Gaussian density functions 

For example, if x V 

— (»— ai)*/**i 

e 


g(^) 


and 


h{y) = - 


—(»— «*)*/**'* 
■^2w’fc2 


-hie z = x + y abo has a Gaussian ^ 5h 

then the random varj^ab e a + holds For example. T.tch 

under the assumptions 
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leads to the conclusion that fix) = 
A deeper study of this question leads 
distributions (Gnedenko, 1962) 


e with b'^. 

to the theory of infinitely divisible 


3 2 Markov Processes 

d.s!uss subseqrn'tw‘“M7atrmT"®n“,^ ^ 

rs,;re irri-f t 

discrete times r = 0 1 2 a* , . 

system in state i will be t’rnn.f ’’ “ P™>’ability a that a 

^(0 = (.„), t "l l ■" ^he matnx 

Plicity, let us consider the stahl ^ matnx For sim- 

ts constant "^tch the transition matrix 

on a PhenLenolog?cal kve'l°syaeTO*thm we ^ 

enough to treat bv means ofrf.fr ! ‘ “"derstand well 

systematic and vei^ readable acto^um of tl " ‘’f equations For a 

to the domains of mathematical nh * npphcation of these techniques 
(1960) niathematical physics and biology, see Bharucha-Reid 

T a.. .. / .X . 

at 

that 


. . ,N Letting one time interval go by, we readily see 

A 

Iteration of fhe matr™Id““ Onc^agmiTtf'’'' " oquivalent to the 
system introduces iteration ® time-histoiy of a 

'^-considerthe case Where we Iterate random matrices, say 
: 


sZ%-. Z.Z„ 


^0 — X, 


■ntere' obtain other types of 


M With probability p, 
Uwith probability I 
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setting 
we have 

/aW=«N-1'.""-' ' '■ , , „f 

The Study Of the I.m.mgd.stnbu^ 

JVjv IS quite difficult, see Bellman (19 ), 5-0 usme positive definite 

(1960), for a generalization of Markovian processes using positiv 

matrices as probabilities, see Bellman (1 a) Kemeny and 

For an inLduction to the theory of Markov chains, see Re y 

Snell (1960) 


3 3 Random Walks 

AtnosttnteresttngtypeofMarhovp^ 

consider the interval [u, b\ and mark the lattice points 

I I — 


b- 1 


— I 

b 


^ ^ I k 

^ ^ «nint k has a probability p* of 

and assume that a particle at an ^ of moving one step to 

moving one step to the left and a probability P. 

the right in a unit time v„nw the expected time to ^ 

starting at it. we may want to know 6, etc Proc- 

boundary, the probability o jjyeiy studied, see Bharuc a 

esses of this nature have been extens y 
(1960) and Feller (1957) 

To treat the second question, 

= the probability of amving at a be 

iota<k<b Then, It IS easy to see that 

with u, = 1, Us = 0 blems of ‘’'''“^[betheo^ of invariant 

An alternate approaeo ' (lay be obtained from the tneo^ 

typesoffunctionatequaUom.may^^_ Bellman, Kalaba. A Wing. 196 

imbedding (Bellman 


3 4 Learning Froc 

Tbelasttwosecuonsse.^-^^^^^^ 

tant applieations o 
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processes Functional equations are basic in this area, see, for example 
Bellman, Harris and Shapiro (1953), Bush and Estes (1959), Bush and 
Mosteller 0555), Kanal (1962), Karlin (1953), Luce (1959a, 1964) 
To see how learning processes lead to functional equations under 
suitable hypotheses, consider the random walk model in Sec 3 3 and 
suppose that experience modifies behavior m such a way that a move to 

the right or le°ft Probability of moving to 

top!]”® independent of ft and all equal 

"" ten tftfe left "lo™ hn® 

^ tenToftenlhf ”-e has 
In place of the function u, of Sec 3 3, we introduce the function 

■ Sr “I'srs"' » *■ •“■r * 

The same reasoning as before yields the recurrence functional equation 
"‘(P)“P‘'‘-'[Pt<P)] + (i -p)«»-,[rj(p)] (19) 

Ire v^phSt:" p™"' 

papers and books c^ed pmSw S 

aspects of equations of this nature am surprisingly iom^pfex”’"^''*"''™”' 


decision PROCEs'se^*’^° MARKOVIAN 
4 1 Dynamic Programming 

Mark^rdrorp^cLt^^^^^^^^ P-sses is the class of 

programming process In place Jf the 0 -^ ™P°rtant type of dynamic 
we have equations of the form '"i“roin« relation of Eqs 18 or 19, 

«j(/i) = max {g(p, ,) _p p(q-)u^^iT,(p, 9)1 + 11 r u 

Th.. P^^'^I‘~P(9)l“i-.n(p,9)]} (20) 

These equations arise in a V 

•hcory, in inventoiy, m con7roT;he"o’rri'e'" '‘>“'P'""‘ -P'‘>-n’rot 
D. sequential analysis, throughout 
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opera, ions research, etc See Bellman (1957, 1961) and Bellman and 

° Smce dynaic programming will be required for a 
the functional equations associated with 

briefly review some of the fundamental concep s o ^ , decision 

Dynamic programming is a mathematical theory f “ “7" 

nrocesses As oDDOsed to classical analysis, the emphasis is on policies, a 

generalization and extension of the 

Ldback control A pohcy, quite s.mfdy, « a ", ,s 

you do in terms of where you are and w ^ f number of 

clear that dynamic programming can be used in the stuoy 

psychological processes yielding maximum returns 

The determination of optimal polic > . ^ following intuitive 

or most desirable objectives, is facilitated by the follow, g 

itlwleiier the mtttal to the state resulting from the 

must constitute an optimal policy witn reg 

first decision . . nc to study the analytic 

To obtain “detve computational algorithms, we 

Structure of optimal policies and t « 0 

proceed in the following . ^ase space, the state vector A 

IS specified by a point p m genera P where q is the decision 

decLcn IS eqmv^ent to a return of g(p,9) - 

variable As a result of dec-<>"=. 

obtained The problem is to make /V 
which maximize the over-all return 

/,(p) = the return from an A-stage process using an optimal policy, 
starting m state p 

Then the principle of optimality yie 

y^(rt = maxWP.4)+A-.[’-to«)»- ■ 


With 


y,(p) = maxg(p.9) 


of eaultions of this nature, together with 
For extensive discussion 4 numerical illustrations, 

applications across the sem ^ Bellman and Dreyfus (1962) 

see the books by Bellman u . dynamic programming has 

In particular, let us n feedback control processes With us 

particular relevance to 
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aid, we can formulate the calculus of vanations as a semigroup process 
involving decisions at each stage, and we can study stochastic and adaptive 
variational processes (Bellman, 1961) and Bellman and Dreyfus (1962) 


4 2 Allocation Processes and the Maximum Transform 


As an example of the use of these methods, let us consider the problem 
of maximizing the function 

under the constraints 


+ iCa + + > 0, 1 = 1,2, . ,A 

Considering this as a multistage process, write 

M^) = maxRjv Pl) 

Then ** 

ff,(x) = max lgt,(x^) +/n-iP - *«)], W ^ 2, 

fii^) - gi(*) (22) 

Remarkably, this equation can be solved by means of transform tech* 
niques Write 

F(,y) - max [f(x) - xy] = Af(/), 

«^o 

the maximum transform Then, under mild constraints ony^(x), 

/(aO = mm [F(y) + ijf] 

Furthermore, 

+ h[x - = M(g) + M(h) 

We thus have an analog to the Laplace transform, particularly with 
reference to its ability to disentangle convolutions 
Applying the maximum transform to Eq 22, we have 

Af(/s) = Jl/fe) + MWx-d 




1 = 


Hence 
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For detailed discussions, see Bellman and 
where the connection with duality theory and convexity is 
Further results are found there 


4 3 Stochastic Multistage Decision Processes 

Returning to the abstract multistage pension pr-ess^^^^^^ 

Sec 4 1, let us suppose that both the variable r with 

tion. T{p,q, r). are stochastic, depending 
known distribution function In this case, 

/■ (p) = maxexpR\, 

« *■ 

we obtain the functional equation 

/,(p) = max (J{g(P.4. r) r)]} ^GW)> 

/i(p) = fs(P-4" 

In the same fashiJn, we can treat multistage games (Bellman. 1P57) 


4 4 Adaptive Processes 
Let us now introduce a further 

process contains a number of “f "““"/'“"vision may not be known, 
'"in a "situation of this W'’^■‘;„^tuat «Umrs'?or Tht '"kCn^a- 

tiSs and“quaMms"wry% as Th^rheoi^'^ 

and Kurland (19biJ. ,„^ched Ictus 

and Tou (1963) ,h,s nature arc ^proach 

eo^" ' 
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return of 1 u'nit'and°th'^ * known fixed probability p of yielding a 
yielding a return of 1 unn° Giver/i'oD”’‘f ’ P™>>abdity of 

should we annortinn n, . . ^‘’PP°rtnnilics to try the machines, how 

Let r renresen r n i't “ “ the expected gam’ 

second machine, and rfGM probability of success on the 

function for r After m “ Priori probability distribution 

we agree to tr'antform^GW a " 

function ■' ^ posteriori probability distribution 

„(r) = 

Jo 

then ^ probability of success after m successes and n failures is 


Pmn » 


Let 


//"(i-i-rdCM 


remaining trials hawn/” policy obtained from N 

n failures ’ Previously observed m successes and 


-—‘...ling 

« failures 

‘be pnnciple Of optimality ,.,, 3 functional equation 

fn n(A^) = maxF ^”* „(Ar — 1 )] -j 

This process hac i, «+i(A^ — IXp^l 

(1956) and Karlin d-scussed see for 

’ “radt, and Johnson ( 1955 ) ^ example, Bellman 

^tferences 

tVoiit, 19™6,®3-68”' m the iheoiy offun 

* 1 ' i, of some probl ' Math 

A«; f >p. 351-3sr“'™ Bo-ux a„e I. ^ 

— Ho.fop,e 
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Connectivity, axiom of, 260, 276, 290 
Conservative system, 469 
Constant apparent brightness, index of, 
144 

Constant outcomes, 298 
Constant utility model, 332-337 
Contiguity principle, 210 
Continuity, axiom of, 260 
condition of, 286-287 , 499 
definition of, 266-267 
equation of, 15-16 
stochastic, 350 

Continuous space process, 417 
see also Stochastic process 
Continuous time Markov chain, 464- 
470 

see also Markov processes 
Contours of equal hue and luminance, 
144-145 

Contrast effect, 72-73 
Contrast interaction, 142 
Control theory, 494, 506 
Convergence, in distribution 423 
in measure, 424 
Convexity, 509 

Coombs' random utility model, 387- 
389 

see also Utility model 
Covariance, 449 
CnienoD responses 144 
Critical band, auditory, 70-71, 83-86 
visual, 52, 54-55, 70-71 
Critical event, 234-235, 237, 240-243 
Critical flicker frequency, 107-113 
and brightness enhancement. 111 
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Critical flicker frequency, and intensity 
discrimination, 109-110 
effect of temperature on, 107-108, 

110 

light-dark ratio in, 110-112 
and luminance, 111-112 
and neural physiology of, 111-112 
organic invariance of parameters or, 

no 

parametric analysis of, HO 
Critical fusion frequency, see Critical 
flicker frequency 
Critical ratio, 52, 70-72 
Critical spectrum width, 52 
Cue(s), adaptation of, 170, 216, 2 
adapted, 219 

addmvity o{, 222-223, 230-231, 239 
common, 167-169 
conditioned, 170, 218-219 
conditioning of, 207, 214- 
correct, 224 
discriminative, 167 
irrelevant, 169-170, 215-216, 219, 
223-224, 242 
observed, 167 
partioning of, 215-21 
redundant, 220-222 
relcvanr, 169-170, 215-216, 219, 
220-222, 233 
suppressed, 170-171 
wrong. 224 ^„oceDt Stimulus 

iee aho Category, 

element, S.tnat.onal va™bte. 

and other 

Cue conditionine model, 21 

axioms of, 216-21 222-223 

and delayed remtore«menl,^^ 
and individual 222-223 

and partial remtorcctncnl, 2 

response bias m, 

Cumulalive d.stnhutmn 
Curves, see speciflc topics 

Dark adaptation, 

Death ^ultistatc. 994 

Decision PtP““^ moUntat' 

see«lsoDecii.tm<6~"',56 
Decision theory. 4’- jop-SIO 
muhistase. 494. ■> 


Decomposition assumption, 363 367, 

ex^nmestal tests of, 390 tn, 39M97 
Density of ennervation, see Neura 
density 

Density function 421-423 

neural, 55, 68 

xee oto Distribution 
Denumerable Markov chains 

464, 467-471 
Detection, 163 
Diabolic subject, 383 
Dice game, 396-397 
Differential equations 49 . 

Dimensions, color, 113 

relevant 242 ,5 220-222 

stimulus, 207, 21 
Direct conditioning 

Ml SI«S"“'= 

puriiy, 146 

of stimuli. 75«33 
rctoDliferemialscmi.ivi.y.J''" 
noticeable difference, and 

Threshold 

Discrimination leammi.^ 

experiments in, 

DtJCfiminaiion opera 
Discriminalivc .tmmlus 17.. 
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Displacement, assumption of, 126 
of cochlear partition, 32 
Distribution, 421-423 
discrete, 422 

fundamental for vision, 125-129 
Gaussian, 104, 109, 422-423 430- 
431, 503 

geometric, 445-446 
infinitely divisible, 431 
logistic, 335 
luminance, 103-104 
of neural excitation, 72 
normal, 104, 109, 422-423 430-431 
503 

Poisson, 422-423, 430-431 
of sensation, 78-80 
spectral, 126, 129, 136-137 
see also specific topics 
Drive, 252 

Dual process models, 166-175 
Bush-Mosteller, 167-169 
for Identification learning, 166 
Restle, 169-172 
stimulus sampling, 175 
Wyckoff. 172-175 
Dual standard sequence, 270 
Duality theory, 509 
Dynamic programming. 494. 506-509 


Ear irammmion characlerirtic of 4 
37. 42-46, 67 ’ 

Ear canal, pressure transformalions ,n, 
Eardrum, 40 


acoustic resistance at, 43 
effective area of, 40 ’ 
impedance at, 41-42 

""P'dance al, 43 
Effect, law of, 179 

Effect of reward, direct, 180-181 2 

ECneralized, 180-181 201 

E asticity, bulk modulus of 15 
Electrical impedance 12 ’ 

Electroacoustic analogies, theory 

Electroacoustic transducer, 5-6 s 

■:'«rom.ba„ieaU„.i„,.:;’,^ 

Elements, „e ,p.j,f,^ 


Elliptic functions. 500-501 
Equally spaced alternatives, axioms for 
277-278 

Equation, see specific topics 
Equilibrium, see Asymptotic behavior 
and specific topics 

Equipment and replacement theory, 506 
Equisection experiments, 47 
Equivalence relation, 114-116, 271 
Ergodic chain, 447 
limit theorems for, 449-452 
Error(s), expected number of 221- 
222, 227-231, 244-245 
last, 235-237, 244-245 
role in learning, 245 
role in strategy selection, 224 
total, 239-240 

Error trials, learning on, 243-245 
Esthetic judgments, 307-308 
Excitation, neural, 58-59, 90-91 
function, 122-123 

Expected, see Errors, Utility, and other 
specific topics 

Experiment, see specific topics 
“P^^f^enier controlled event model. 

Experimenter controlled linear learning 
model, 188-189, 371, 376 
Exponential functional equation, 498 


uiuuei, 4U1 

fcchners (psychophysical) law 49 
120-121, 129, 134 
eedback control processes, 507-508 
rerry.porter law, 1 1 1 
Fidelity of optical systems, 103 
Mnite Markov chain, 44(M58 
ergodic behavior of, 447-452 
transient behavior of, 442-447 
mite difference system, 277-278 
fixed point, 189-190 
Ricknr nonlours, phylosenetic. 110 
""“r frequency, 107-108 144 

Fo^h"'? frequency 

Forced^nic, experiment, k-allernalive, 
*63-164 


Forced oscillation, 8-10 

°Tor209 

Form invariance assumption, 126, 129 
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Forward equation, 465-466, 469-470 
Fourier analysis, 8, 112 
Fourier theorem, 7-8 
Fourier transforms, 104 
Foveal tritanopia, 140 
Fractionation, method of, 46-47 
Frequency, j n d for, 47-54, 68 
modulation of, 47-48, 50-51 
sensitivity to, 69 
spectrum of, 57-58 
Function, Bessel, 31 
Cantor, 439-440 
characteristic, 297 
elliptic, 500 
excitation, 122-123 
gamma, 493-494 
Hankel, 31-32 
linear, 499-500 
linearizing, 494-496 
loudness, 80-93 
monaural, 83, 85 
luminance, 144-145 
multiplicative, 502 
order preserving 268-269 
outcome, 415 

of Brownian process, 437 
psychophysical, 49, 103-106, 109, 

120-121, '^^”^03-104 

Simple harmonic, 6 ’ 

sincwavc response, 103- 0 
spectral distribution, 12 
137 

transter, 22-23, 113 
utility, »cc Utility funclion 

Functional equations, 
approximate, 501 
causality and. 497-4 
Crdos*. 501-502 
exponential. 498 
recurrence 506 
punJamcntal, Jrc *1^'^ ^ 

-runnclme" process, lt» 

t •l‘*7-3'0. ’67- 
GamHint 391-40: 

”''■,”‘’10 493-i” , 

Gamma function. 4..- 

Caussiao >l's'»*^“;’‘’;2, 

423, 430-131. ipo-iro 

Ceneraltratioo „,.<s 

GrncralirrsI, are *> 


Geomelrtc dtslnbutton, 445-446 
Guthrian Iradition, 210 

HanUl function, 31-32 

Hcartng, phystolog.cal mcchan.sms of, 

3 

lec also Auditory 
HelraholB hoc element, 120- 24 
Herstein Milnor system of utility, 

Hetewhromaltc brightness mateh.ng 

144 

Higher neural eenler. 87-88 

Higher ordered melrte 272-279 

Hooke’s law. 15-16. 489 

Hue, relation to '“"""“"“•j 
spectral discrimination of, 

Ideal ulihty poi"' 

,den..r.ea..ontorn.ng 65-:»- 

and beta J66-I75 

impedance. lO-H 
acoustic. 13 ^ 

acoustic input -i 
of cochlea 35-37. 40 ^ 

of cochlear partH'on- -6 - 
at the car, I4 
at the eardrum 1 

Hcelrieah 12 

specific. I- 

onmlin, re-peme 

ineeoii.e.^': ,11.3s: 

Incrcmem 1 .es 

Indepc*^ ' 
ee-s! -^es c!. 

O* pi‘h 1 6r-1.4 
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Independent increment, 420-421 
Independent increment processes, 436- 
440 

Brownian motion, 436-457 
Poisson process, 437-438 v 
stationary increments of, 421 
Independent processes, 428-440 
Independent stochastic process, 4I6-4I9 
see a\so Stochastic processes 
Indifference, 292, 310 
Indifference relation nontransilivc, 
279-281 
transitive, 260 
Inductance, 1 1 
Inequalities, study of, 501 
Infinite difference structures, 277 
Infinitely divisible distribution, 431 
Information theory, 413, 420, 481-482 
Inhibition area, 73-80 
m the ear, 80 
extent of, 77-80 
in the eye, 80 
m the skm, 73-75, 80 
Inhibitory interaction in eye of hmulus 
73 

Input impedance, see Impedance 
Instantaneous threshold 106 
Insurance, 282 
Integral equations, 490 
Intensity, 38-39 

discrimination of, 80, 106-107, 120- 
121, 125-126 

Interleaving of presentation sets. 377- 
378. 395-396 

ImcrjHJlalion problem, 493-494 
Interval scale, 272, 276. 284, 307. 310 
314, 333 

Introspection, 273 
Invariant, relative, 494 
Invariant imbedding, 494, 505 
Inventory, 506 

Irrelevant, see Categories, Cues, and 
other specific topics 
Isometry rclaiion, 291 
Iioiensilivuy curves, set ROC curves 
Iterated logarithm, law of, 435-436 
Ileration, 492-494, 497 
of a matrix, 304 

muliipUcatKe eharactcrntic of, 493 
stochaitie. 496. 303 


JND, see Just noticeable difference 
Judgment probability, 362 
Jumping stand experiment, 163 
Just noticeable difference, assumptions 
about. 279-281 
and Fechner’s problem, 495 
frequency, 47-54, 68 
mtensity, 80, 106-107, 120-I2t> 125- 
126 

subjective size of, 80 
see also Discrimination, Semiofder, 
and Threshold 

Large numbers, law of, 464 
strong law of, 424, 432 
weak law of, 429, 431 
Lateral neural inhibition, 73-75 
Law, see specific topics 
Learning, 504, 509 
animal, 164-166, 186 
nongeneralization and, 187 
perfect, 168, 173-174, 184, 186-187, 
201, 219 

see also Asymptotic behavior, Beta 
model. Conditioning, Linear 
operator model, Perfect learn 
mg, and related specific topics 
Learning curves, 165 
Learning process, 164, 504-506 
see also Conditioning process 
Learning rate, 209, 214, 219, 239. 372- 
373 

Leastung theory, 367-368, 424-415 
see also Learning and specific topics 
Least preferred outcome, choice of, 
356-358 

Less probable than relation, 294-296 
Level of aspiration, 304-305 
Lexicographic ordering relation, 261, 
264 

Limit law, see Limit theorem 
Limit theorem. 428-432, 464, 502 
see also Central limit theorems 
Limulus eye, 73 
Linear function 499-300 
Linear operator model, 167-168, 217, 
425-427. 455-437. 474-475 
basic probabilities for, 425 
expenmenter<onlrolled. 188-189 
path independence of, 176-178 
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Linear operator model, and position 
habits, 189-193 
single process, 185-193 
with symmetric operators, 186-187 
Linear programming model, 317—319, 
384, 401 

vs actuarial model, 319-320 
criticism of, 319 
methods for, 401 

and newspaper content preferences, 

384 

probabilistic version of, 320 
Linearizing functions, 494-496 
Line element, 120-130 
in color space, 128-129 
test of theory, 150 
Logarithmic decrement, 9 
Logarithmic 3 space, 124 
Logistic distribution function. 335 
Longitudinal plane waves, 16 
Loss matrix, 375 
Loudness, 4 

and burst duration 90-93 
contribution of critical band , 
of intermittent white noise, 92-^ 
of masked tone, 81-86 
of a sinusoid, 84-86 
and stimulus duration. 90-yJ 
Loudness function, 8(1-93 
constants of, 88 
and critical band 83-86 
for 1000 cycle tone, 88 
Loudness matching 8l 
Luminance, 135 

additivity of, S'*. ''^‘,40 

and response strengt 
and saturation. 139 

Luminosity. I-'- ' 

Mach bonds J”’’ '°^(loudncss). 

r,r”ud:c.t.n"a^ 

hlarjinll nickct 

454 ._j7t 

coniiniioos time 


Markov chains, denumerable, 458-464, 
467-471 
finite, 441-458 

law of Iterated logarithms for, 450- 
451, 454 

taw of laree numbers for, 450-451 
and nonstationary transition prob 
abilities, 483 

Markov process 418-421, 42 
440-473, 504-505 

and dynamic programming 506-510 

random walk, 505 Cnrhas 

,ee also Markov chains and Slochas 

fic processes 

Martingale stochastic processes, 418- 

419, 421, 475-478 , .,5 

convergence theorem £0^477^78 

svstem theorem for 476-4" 

Mass, of cochlear Partd^l 
Matching color, 114-120, uu 

l^frcolordiscriminaltonand 

Color misiure 

Matrix, fundamental 444-446 
iteration of, 504 

ptyoff. IW. 182. 393 

:r'SioTi.0-7;';7'>„^3 

j:”::umrr%= go. 

Maze experiment^ 

186 230--31 
Mean firs. passaS' 

tope 

i;raS-..^c. 0 P« 

and phtsiological findings. . 
Metameric eolors 119 
Mcl.thetie continuum fo 
Method 

MetiK operation 

MetfK seahnt 4-0 

MettKally iransilise prnee’ 

Middle eat 40-41 

„,.o.l. .nalogue o' 

*ouod l-x. f 
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Minimal process, 470, 472 
Minimally discnminable differences, 
12U125 

see also Discrimination, Just notice 
able difference, and Threshold 
Minimax strategy, 302-303, 326, 402 
Minimax theorem, 302 
Mixture spaces, 284-290 
Model, see specific topics 
Moderate stochastic transitivity, 340, 
344-346 

see also Stochastic transitivity 
Monaural loudness function, 83, 85 
Monte Carlo runs for strong stochastic 
transitivity, 386-387 
More probable than relation, 294-296 
Mosteller Nogee experiment, 310-314 
criticism of, 314 
Motivation, 252 
Moving average, 480 
Multistage decision processes, 494, 507, 
509-510 

Multiplication condition, see Product 
rule 

Multiplicative condition, 297, 341 
and other properties and models 
343-347 

Multiplicative function, 502 
Munscll color notation 133-134 

N-bets assumption, 320 
Nervous system, response of, 58 
see also Neural activity and related 
topics 
Nets 272 

Neural activity. 87-88, 90 
peripheral 90 
spontaneous, 59, 72. 89-90 
Neural availabtlily, 110 
Neural behavior, specific properties of 
101 

Neural density, 35, 68 
Neural effect of visual exciiaiion, 109 
Neural exciiation, 58-59, 90-9J 
Neural firing 88-90 
Neural refractory period, 1 13 
Neural response, characteristics of RT- 
SS. Ill 

to pairs of short pulses, 60 
Neural responsiveness 136 


Neural unit, 5, 73, 104 
we also Response unit 
Newspaper content, preference in, 383- 
384 

Newton’s second law, 17 
Noise, 5, 84 

Nonconfusable stimuli, 201-202 
Nonadapted see Categories and Cues 
Nongeneralization, assumption of, 184 
Nonlinear programming model for 
measurement of utility, 319-320 
Nonreversal shift experiment 223, 

242 

see also Reversal shift experiment 
Nonreward effect of, 189, 193 
Nonsatiety axiom, 260 
Normal distribution, 104, 109, 422- 
423, 430-431, 503 

Observing response, 172-175, 369-370 
see also Orienting response 
Observing response mode) 172-175 
369-370 376, 396 
Objective probabilities. 284, 291 
estimation of, 322-325 
and subjective probabilities, 323-326 
see also Response, Stimulus, and 
special topics 

One element stimulus sampling model, 
applied to concept utilization 
212-214. 246 

applied to gambling experiment, 360, 
368-371 

applied to paired associate learning 
209-211. 246 

applied to probability prediction ex 
pcriment, 368-370. 395-396 
asymptotic behavior, 369 
axioms for, 209-210 
expenmcnial test of. 395-396 
Open conditions 269, 276 
Open set, 266 

Operations research. 506-307, 509 
Operators, identity, 169 
symmetric, 181-183, 186-187, 189- 
200 

see also Beta model and Linear op 
craiof model 

Opponent process model, 135-144 
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Optical imagery, 103-104 
Optima! strategies, 300-301 
Optional stopping system (of play), 476 
Order preserving function, 268-269 
Ordered metric, 272, 274, 307-310 
additivity of, 309 
experimental tests of, 309-310 
Ordinal, see Scale and specific topics 
Oriented behavior, 108 
Orienting response(s), 172-175, 369- 
370 

Ossicular chain, 40 
Outcome(s), certain, 257, 331 
conditional schedule, 257 
consistency of, 180-181 
constant, 298 
elementary, 331 
preference over, 178-179 
set of, 164 

uncertain, 257, 281-306, 359- 
see also Payoff 
Outcome function, 415 
of Brownian process, 437 
Oval window, effective area of, 
Overlearning 202 


j vs JT, asymptotic values for, 390- 
399 t#;i_l64 

Paired associates experiment, 

207, 209-212. 2^6 
Paired comparisons, ° 3Q1 

Pareto optimal strategies. 300-Jt. 

Pan mutual betting, 322-^^^ 

Particle velocity, 19, 2l 
Partitioning of cues, 2 

of straleEiss, 225 ,75,178 

Path "“7^28 452-455 

Pattern model, 427 

and level of nsk,325-«» 

Payoll matrix. 164, ,_435 476- 

Penny-matching eame. 

^32 , , in7-104 

Perceived brightnes^, 

Perceived color. • ,„diK: 

Perceived color E»mut 

non. 142-143 

perceplion ,87 

Perceptual process. 


Perfect IcarnmE, 168, 173 174, 184, 
186-187, 201, 219 

see also Asymptotic behavior 
Period, 7 

Periodicity pitch, 46 
Peripheral receptors, photochemical 
events at, 101 
Perseveration model, 23 
Phase. 6, 34-35 
sensitivity to, 
velocity, 16. 35 

Phaser, 6-7 ins-I08 

Photochemical theory, . 
Phoiopigment concentra ’ * _ „f 

pSeLplor process, descriplmn of. 

105-107 141-142 

Physiological 

Pinball machine experiment. 

Pitch, 3, 46-55 
Place leamrog, 24u 
vs response learning, 

Place theory of hearing, 52-54, 8 

Plane wave, 19 ,22-423,430-431 

Poisson process 436-438 

Poker hands, 311 

Policy, 507 

optimal. 507 198,201. 

Position habits, 1® ’ ^55 

460 

Preference, 5 396-397 

:r,Xenr.;fl.ke..<^, 363-364 

for shades of gray.^J^W 

J"rn«."-rx.-dl«l,onof 

255-258 ^ 

origins of. 25- - .,35,236 

Prcsolution 

pressure, auditory. 3*-. 
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Principle of optimality, 507 
Prisoner’s dilemma, 300-301, 303 
Probabilistic choice theories, 331-350 
359-377 

and asymptotic behavior, 377-378 
constant utility models for, 332-337 
random utility models for, 337-339 
response probabilities in, 331-332 
transitivity m, 339-342 
for uncertain outcomes, 359-377 
decomposition assumption m 362- 
367 


expected utility models, 359-362 
Probabilistic ranking theories, 351-358 
Probability, basic, 415-417 425 427- 
428 

of errors in stratciy selection model 
225-226 

objective, 284, 291, 323-325 
response, 164, 168, 171-172, 332 
subjective, 293, 321-327 
see also Subjective probability 
theory of, 421-424, 502-506 
transition, 21(3-211 
see also Markov and Stochastic proc 
esses and specific topics 
Probabdity matching hypothesis. 369, 

Probability measure, 293 
Probability prediction experiment, 390- 


Probability preference, 327-328 
Process, see specific topics 
Product rule, 341-346, 379 
Pfothetic continuum, 80, 82 
Psychiatry, 509 

Psychophysical experiment, 254 
characteristics of, 144-150 
response bias m, 255 
s« also Discrimination, Magnii 
estimation, Threshold, and 
specific topics 

Psychophyiical fund, on, 105-106 
Feehner-e, 49, 120-121, 129 ,3 
Pulse sequence, 57-58 
Pure risk. 257, 330 


Quadruple condition 
346 350 


275, 341, 343- 


Quahtative probability, axioms for, 
294-296 

see also Subjective probability 
Quasi ordering, 274 
Queues, 437-438, 483 

Random utility model, see Utility 
model 

Random vanable(s), 421-424 
uniformly bounded, 429 
Random vector, 337-338 
Random walk, 194-197, 440, 455, 460, 
462-463, 505-506 
Ranking probabilities of, 351-359 
Ratio scale, 152, 335-336 
Rational behavior, 256 
“Rational" man, 253 
Reactance, 11 

Receiver operating characteristic curve, 
see ROC curve 
Receptor process, visual, 105 
Recognition experiment, 163-166. 184, 
201 

Recognition learning model, 188 
Recurrence equations, 492 
Recurrence functional equation, 506 
Recurrence theorem, 435 
Redundancy, relevant, 220-222 
Redundant, see Categories, Cues, and 
other specific topics 
Reflexive relation, 274 
Refractory period, neural, 113 
Regret matrix, 303, 375 
Regularity condition, 342-344, 346 
experimental test of, 400-401 
Reinforcement, effect of, 180-184 186- 

187 

m maze experiments, 180 
partial, 169 
schedules of, 425-427 
•n shuttle box, 180 
theory of, 179 
Reissncr’s membrane, 24 
Relation, bisymmetnc, 291 
color equivalent, 114-116 
equivalence, 114-116, 271 
immediate successor, 278 
indifference, nontransitive, 279-281 
transitive, 260 
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Relation, isometric, 291 

less probable than, 294—296 
lexicographic ordering, 261, 264 
more probable than, 294-296 
nontransitive, 279-281 
preference, 260, 262-263 
psychophysical, 49, 105-106, 109, 
120-121, 129, 134 
reflexive, 274 

transitive, 260, 270, 276-277, 297 
violation of, 279-281, 309-310, 

319 

weak ordering, 260, 263, 270, 275, 
285, 298 

see also specific topics 
Relative expected loss minimization 
rule, "hlS-Zn 

RELM rule, see Relative expected loss 
minimization rule 
Relative invariant, 494 
Relevant, see Categories, Cues, Dimen 
Sion, Redundancy, and other 
spectflc topics 
Renewal theory, 438-439 
Replacement theory, 506 
Resistance (acoustic), H 
of cochlear partition, 30 
Resolving power of auditory sys 
Response(s), implicit, 36 
observing, 172-175, 369-370 
orienting, '72-175, 369 
probability of, 168, Hi 

conditional j „„cept 

Response nxiom(s), aPP' 

Utilization, 

„[ cue conditioning model, 

of one element modrl lim 

of Restle’s model ‘j* . ^34 

for stimulus *®*'^*‘° 225 

for strategy scRctionm^^^l. 

Response bias, 719. 

Response learning 3 , ,40 

vs place learning 23 
Response set, 35_g0 

Response unit, 73 . 

also 

Restle model, 169 

Restoring force, ' 223, 242 

Reversal sbifl experiment. 


Reward and nonreward, effect of sym 
metric, 181-182 

equality of effect of, 183-184, 200 
generalization of, 181, 187 
relative effect of, 180-181, 186 
see also Reinforcement 
Riemannian metric, 131, 133 
Risk, 257, 330 
ROC curves, 165, 188 
R-order dense set, 263 

Sample patch of Brownian process 437 

Savagc-sax,nmsforntd,.yandsnb 

jective probability, 298-299 
Scale, attitude, 268-272 
cardinal, 259 

interval, 272, 276, 284, 307, 310 
314, 333 

o^rtaU59-76f772,284,333 

ratio, 152, 335-336 
subjective, 332-333 

''S‘'s=«=. 

estimation 

Selection conditioning model. 23 
Selective listening, 84 
Semigroups. 494, 49 , 

Scmiorder, 280 
Sensation fnndamental, 121 

^dSrTbnUon over, 79^0 

lion and Threshold 

^:::X'colorm'mmandNla.=bmg 

. „ wel masking noise m 81-8- 

SpSaMe.opo.og.c.lspaca.766 7^^ 

standard 278 
Sequential analysis. 506 
Set, see specific '»P'“ 

Shift transformation, 479 
Shnttlebox espermeet m 
Similarity index, 167-169 

Simple harmonic function 6 17 
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Smewave response function, 103-104 
Single element assumption, tlO 
Single process models, 176-2Q1 
beta, 193-201 
linear, 185-193 

experimenter controlled operators 
for, 188-189 

with symmetric operators, 18^187 
nongeneralization, 184-185 
Situational variables, 170 
see also Categories and Cues 
Skeleton chain, 465 
Skewness of bets, 329 
Skin sensitivity, 73-76 
Solution of equations axiom, 270 
Sone, 82 

Sound, diffraction of at listener’s head 
46 

frequency of and pitch, 46 
intensity of, 21 

transmission of, in the cochlea, 24-38 
m the ear, 21-46 
in the middle ear, 22, 38-43 
Sound pressure, 13-14, 19, 21 
measurement of, 21-22 
and particle velocity, 21 
Space, base for, 266 

color, see Color space and Chroma 
ticity space 
connected, 271 
logarithmic 3 , 124 
mixture, 285-290 
separable, 266 271 
topological, 266-267 , 287 
see also Topological space 
Spatial contrast, 101 
Spectral distribution functions, 126, 
129. 136-137 
Spectral measure, 462 
Spectrum, frequency, 57 
Speech, approximation to, 420 
Spherical wave, 19 
Spontaneous neural activity, 59 72 
89-90 

Siability, 501 
Stable law. 431 
Siapcs footplate, area of, 40 
Slate(s). absorbing. 189-193, 442 
equUibnum. In photoreceptor proc 
ess, 103-106 


State(s), for finite Markov chains, 442 
of nature, 282, 298 
stable, 467 
transient, 442 
State variables, 491 
Stationary processes, see Stochastic 
processes 

Stationary transition probabilities, 419 
see also Markov process 
Statistical decision theory, 252-253, 

256 

multistage, 494, 507, 509-510 
Steady state behavor, 10, 255-256, 495 
equation of, 106-107 
sec also Asymptotic behavior, Equi 
libnum state, and Perfect 
learning 

Stimuli, additivity of, 5, 490 
confusable, 164-166 
differentiation of 209 
dimensions of, 214-215, 220-222,233 
discriminative, 172, 254 
distribution of m cochlea, 75-77 
generalization of, 179 
Identity of, 80-81 
mathematical description of, 5 
presentation probability of, 164 
at sensory cell, 22, 27 
lime pattern of, 57-58 
see also Category and Cue 
Stimulus axiom, of cue conditioning 
model, 216 

of one element model, 209, 246 
of stimulus sampling model, 209-210 
of stimulus selection model, 234 
for strategy selection model, 225 
Stimulus categories, 233-234 
Stimulus component model, 455 
Stimulus elements, 167 

conditioned subset of. 167-168 
Stimulus power, additivity of within 
critical band, 5 

Stimulus presentation, 163-164, 166- 
167 

Stimulus sampling model. 175, 209- 
210, 368-370, 427-428, 452-157 
axioms for, 209-210 
basic probabilities for, 427-428 
for paired associates learning, 209- 
213. 246 
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Slimulus selection model, 233-235 
Stimulus transformation, 22, 87 
Stochastic continuity, 350 
Stochastic Iteration, 496, 503 
Stochastic theories of preference, 256 
Stochastic process(es), 175, 256, 413- 
484 

abstract space, 417 
classification of, 417-421 
continuous space, 417 
definition of, 414-417 
discrete space, 417 
independent, 418 
in information theory, 413 
Markov, 418 
Martingale, 418 
see also Martingale stochastic 
process 

nonbranching 171-172 
path dependent, 175 
path independent, 169, 176-1 
stationary, 418 
see also Markov processes 
Stochastic transitivity, ^ 

experimental tests of, 380-390 
mild, 340 

moderate, 340. 344-346 
and multiplicative condition, 3 

strong! 340, 344-345, 364-365 384- 

wcakm 342, 344-345, 384, 386- 

Strategy(ies), 223-224, 300 
classification of, 225 
equilibrium, 301-302 
maximim, 302-303 
mmimax, 326, 402 
mixed, 302 , 

Pareto optimal. 3 225-227 

sampling probability ofi 225 

selection ot, “3. ^ 224-233, 246 

Strategy selection 

additivity ot cues 233 

and discrimination eep-in"’ 

experimental tcs» o. 276-277 

Strict higher ^^red metric 

see also Utility model 

Strict utility model, j 370 376 

Strong conditioning model. 


Strong higher ordered metric, 276-277 

tee Higher ordered metric 

Strong law of large numbers, 424. 432 
Strong stochastic transitivity, see Slo 
chastic transitivity 

Strong utility model, see Utility model 

Subjective expected utility model, f 
Utility model 

Sub, eel experimenter controlled Beta 

model 371-377 

Subtective probability, 252 

axiom system for, 291 29^ 
of compound event 326 

^L^^a^'---.32l- 

327 

Psychophysical relation and 
specific topics 

Sobteets knowledge of cntcome 378 

Subliminal differences, 170-171 

Suppression of irreleva 

Sure thing P""'"’ 

Surprise, potential, 

Sumval of family names 463^64 

Symmelric operators m-183 

inbeia model 186-187 

linear nperalorm^el, 186-187 
Systems engineering lUl 

Ta^pnralsnmmalinn 4 65,P0-93 

auditory, 3, 5 70-72 

:::2rsd!oJ^a.ion,68 

rrn."->3<. 

inslantaneous, 106 

‘"°4le difference, and Sem, order 

TOr^bold model. aPP^Iokambling 

2.„r:rSl.V3g3 384 389 

Time^ependen. "/jp, 

Timebistoryofsyslem 491 504 

T maze experiment 180. .40 

Tone burst. 96-93 
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Topological space, 266-267, 287 Utility function(s), measurement of. 


base for, 266 
connected, 271 
separable 266, 271 

Transfer function, 22-23, 103-104, 113 
Transform, maximum, 508-509 
Transient response, 10 
Transient state, 441-442 
Transition matrix of probabilities, 210- 
211, 504 

Transitive process, metrically, 479 
Transitive relation, 260, 270, 276-277, 

297 

violation of, 279-281 309-310 319 
Transitivity, stochastic, see Stochastic 
transitivity 

Transmission characteristics of ear, 4, 

37, 42-46, 67 
Triads method of, 307 
Trial independence, 210, 332 
Trials to last error, 235-237, 244-245 
Triangle condition 340 342-344, 350 
Tritanopia 140 

Two person game method, 325-326 

Uncertain outcomes 257, 281-306, 
359-377 

Unfolding technique, 279, 307-308, 

329, 387-388 

Uniformly bounded variable, 429, 436 
Unilateral pair, 388 
Unilateral triple, 388 
Uniqueness, see specific topic 
Utility, Bemoullian, 258 
cardinal, 259 
defined, 258 
expected, 281-284 
of gambling, 330 
maximization of, 258, 282, 295 
ordinal, 259 

Uiihty curves, see Utility function 
Utility diflercnce(s), 272-279 
measurement of, 308-310, 315-319 
Utility function(s), 258-306 
addiiivity of. 267-272 
coniinuous. 264-265, 267 
discontinuous. 264-265 
equally spaced. 315 
exiitcnce of. 261-264,267, 271,277- 
278. 280 286. 289. 299 


310-321 

ordinal 259-264 
and R order dense set, 263-264 
uniqueness of, 262-263, 271, 278, 
286, 289 

Utility of gambling model, 330 
Utility intervals, see Utility difference 
Utility model(s), additive, 267'-272 
asymptotic, 374-375 
constant, 332-337 
expected, 281-284, 292, 327-330 
see also Herstein Milnor, Savage, 
and von Neumann Morgen 
stern, utility models 
Herstein Milnor, 286-290 
probabilistic expected, 359-367 
random, definition of, 338 

necessary and sufficient conditions 
for, 352-353 

relations to other concepts, 344, 
346-349, 356-358 
random expected definition of, 36! 
experimental tests of, 397-400 
relation to other concepts, 361- 
362 

Savage, 298-299 
strict, definition of, 335 
experimental tests of, 397-400 
relations to other concepts, 335- 
336, 344, 350, 354-356, 379 
strict expected, definition of, 360 
experimental test of, 397-400 
relations to other concepts, 361- 
362, 373-374 
strong definition of, 334 
relations to other concepts, 334— 
335, 339. 344, 330 
strong expected, definition of, 360 
von Neumann Morgenstern, 285- 
286, 290 

weak, definition of, 333 
experimental tests of. 380 
relations to other concepts, 334, 
339, 344, 349-350 
weaV expected, definition of, 360 
Uulrty theory, see Utility model 
Utility of variability model, 374-375 
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V scale, 152-153 
Variable, random, 421-424 
uniformly bounded, 429, 436 
Variance preference, 328-330 
Velocily of wave propagation, 16-19 
Vibration maximum in cochlea, 32-34, 
52-54, 69 

Visual acuity, 103, 106 
Visual flicker perception, 101 

see also Critical flicker frequency 
Visual photochemistry, 106-108, 124 
Visual stimulus, quantum aspects of, 
108 

Visual system, characteristics of, 104 
Von Neumann Morgenstem system of 
utility, 285-286, 290 
Voter’s preference, 268, 279 

Waiting lines, 437-438 
Waves, in cochlea, 32, 34 
plane, 19 

propagation of, 14-20 


Waves, reflection of, 20-21 
spherical, 19 
Wave equation, 16-17 
Waveleneth, 17 

discrimination of, 122 125, 
see also Color discrimination 
WDW unit coordinates of color equiv 
alence, 115-118 
Weak conditioning model, 3 • 

Weak higher ordered 
Weak law of large numbers 429 43 
Weak ordering relation 260 263, 270 
275, 285. 298 
also Preference relation 
Weak stochastic transitivity see Sto 
chastic transitivity 
Weak utility model, see Utility m 
Weber’s law, 126, 129 
Webs theory of, 272 


Zero sum game, two pmon 302 

Zone of inconsistency, 3U 



